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(54) Cellulose synthase gene 

(57) mRNA was extracted at the stage for cotton 
plant fibrous cells to accumulate cellulose, and cDNA's 
complementary thereto were synthesized to construct a 
cDNA library. Clones of a number of 750 were arbitrarily 
selected from the library, and they were randomly sub- 
jected from to sequencing. Those having homology to 



an amino acid sequence deduced from a gene of cellu- 
lose 4-p-glucosyltransferase (bcsA) of cellulose syn- 
thase operon of acetic acid bacterium were selected 
from obtained nucleotide sequences of the respective 
clones. Thus, DNA coding for cellulose synthase was 
obtained. 
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Description 

Technical Field 

The present invention relates to a DNA coding for cellulose synthase originating from cotton plant (Gossypium 
hirsutum), a recombinant DNA containing the same, a transformed cell transformed with the DNA, and a method for 
controlling cellular cellulose synthesis. 

Background Art 

Cellulose is used for paper, woody structural materials, fiber, cloths, food, cosmetics, and pharmaceuticals, as well 




in biosynthesis of cellulose. The cellulose-related industry has been hitherto directed to such cellulose products i 
75 have been already produced, in which there has been no trial to develop a new material based on an aspect of bio- 
synthesis. The mechanism of disease action, which is exerted by pathogenic microorganisms on plants, often results 
from the inhibition on cellulose biosynthesis as in Pyricularia oryzae (P oryzae). Therefore, the addition of disease 
resistance to the cellulose biosynthesis mechanism is agriculturally applicable and valuable. Further, cellulose is the 
most abundant organic compound on the earth, and it is a sink in which the largest amount of CO^ in the atmospheric 
20 air is fixed. Therefore, the genetic improvement of cellulose biosynthesis enzymes is also applicable to the industry 
which is directed to the control of C0 2 in the atmospheric air based on the use of cellulose as the sink. 

In recent years, cDNA's originating from fiber cells of cotton plant have been randomly sequenced, and it has been 
reported that full length CelA1 and partial length of CelA2 probably represent cDNAs of cotton plant cellulose synthase, 
in view of the homology to bacterial cellulose synthase gene (bacterial BcsA) (Pear et al., Proceeding of National 
25 Academy of Science, USA (1996) 93 12637-1 2642). The binding ability to UDP-gfucose has been demonstrated for 
CelA1 . However, as for CelA2, the homology has been merely demonstrated for the C-terminal amino acid sequence. 

Disclosure of the Invention 

30 The present invention has been made in order to provide a new method for regulating cellulose production in 

prokaryotic cells or eukaryolic cells, an object of which is to provide a DNA coding for cellulose synthase, a recombinant 
DNA containing the same, a transformed cell transformed with the DNA, and a method for regulating cellular cellulose 
synthesis. 

The present inventors firstly extracted mRNAs at the stage for cotton plant fiber cells to accumulate cellulose, and 
3$ cDNAs complementary thereto were synthesized to construct a cDNA library. 750 of cDNA clones were arbitrarily 
selected from the library, and they were randomly subjected to sequencing. Six amino acid sequences were derived 
for one nucleotide sequence of each of the obtained clones to select those having homology to an amino acid sequence 
obtained by translation from a gene of cellulose 4-p-glucosyltransferase (bcsA) of cellulose synthase operon of aceto- 
bacterium. As a result, genes, which were classified into three types or groups, were found, and they were designated 
40 as PcsA1 , PcsA2, and PcsA3 respectively (PcsA is an abbreviation of "Plant Cellulose Synthase A"). 

That is, the present invention lies in a DNA coding for any one of the following proteins (A) to (C): 

(A) a protein having a cellulose synthase activity and comprising an amino acid sequence shown in SEQ ID NO: 
2 or an amino acid sequence involving deletion, substitution, insertion, or addition of one or several ammo acids 

45 relevant to SEQ ID NO; 2; 

(B) a protein having a cellulose synthase activity and comprising an amino acid sequence shown in SEQ ID NO: 
4 or an amino acid sequence involving deletion, substitution, insertion, or addition of one or several amino acids 
relevant to SEQ ID NO: 4: and 

(C) a protein having a cellulose synthase activity and comprising an amino acid sequence shown in SEQ ID NO: 
so 8 or an amino acid sequence involving deletion, substitution, insertion, or addition of one or several amino acids 

re)evant to SEQ ID NO: 8 : and comprising an amino acid sequence shown in SEQ ID NO: 11 or an amino acid 
sequence involving deletion, substitution, insertion, or addition of one or several amino acids relevant to SEQ ID 
NO: 11. 

55 in another aspect, the present invention provides a recombinant vector comprising all or a part of the DNA as 

defined above, and a transformed cell transformed with the DNA as defined above. 

In still another aspect, the present invention provides a method for regulating cellulose synthesis in a cell, com- 
prising the steps of introducing the DNA as defined above into the cell, and expressing RNA having a nucleotide 
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sequence homologous to the DNA as defined above or a nucleotide sequence complementary to the DNA as define, 
above. 

SEQ ID NO: 1 corresponds to a sequence of PcsAl, and SEQ ID NO: 3 corresponds to a sequence of PcsA2. 
SEQ ID NO: 5 corresponds to a sequence of 3' -side region of PcsA3, SEQ ID NO: 7 corresponds to a sequence of 5'- 
side region of PcsA3, and SEQ ID NO: 9 corresponds to a sequence of internal region of Pes A3. 

It has been demonstrated that PcsAl and PcsA2 of the DNA's described above are DNA's coding for cotton plant 
cellulose synthase, according to the expression in eukaryotic cells (animal cells and/or yeast). It has been also dem- 
onstrated that an antibody thereagainst inhibits the cotton plant cellulose synthase activity in a cell-free system. Further, 
PcsA3, which is different from PcsAl and PcsA2, has been found. Any one of these species was obtained as partial 
one, at the stage of clones obtained by the random sequencing, and no 5'-portion of the coding region was contained. 
Therefore, clones which have sequences of S'-portions were isolated in accordance with the 5'-RACE method based 
on the use of PCR to determine the sequences. As a result of this operation, the sequences of the S'-portions corre- 
sponding to the partial length clones were obtained for PcsAl and PcsA2, 

On the other hand, as for PcsA3, a sequence of a 5'-portion of another clone, which was considered to belong to 
the same PcsA3 group, was obtained. The both sequences had extremely high homology, and hence they were con- 
sidered to have underwent multiple gene formation relatively recently originating from an identical gene through the 
process of duplication. Therefore, even when the both are combined with each other at corresponding portions to 
construct a fused gene followed by expression, it is assumed that the activity and function of a produced enzyme may 
not be affected thereby. 

As for PcsAl and PcsA2, in order to obtain a full length clone, primers were designed on the basis of the sequence 
of the S'-portion and the sequence of the 3'-portion of the partial length clone to perform PCR. Thus, a clone containing 
ORF was obtained. 

Those applicable as the template to be used for the RACE method may be any of cDNA synthesized from mRN A 
and a phage library. When the phage library is used, it is possible to use a sequence in the vector as a 5*-side primer. 

As a result of random sequencing, seven clones concerning PcsA2 were most abundantly present, of 15 clones 
seemed to code the cellulose synthase. Expression was confirmed in eukaryotic cells (animal cells and/or yeast) trans- 
formed with the cellulose synthase gene. As a result, the cellulose synthase activity was observed. 

The present invention will be explained in detail below. 

<1 > Preparation of cotton plant cDNA library 

Cotton plant fiber cells at the stage of cellulose accumulation are preferably used as a material for extracting mRNA 
to construct a cotton plant cDNA library. The method for extracting mRNA is not specifically limited, for which it is 
possible to adopt an ordinary method for extracting mRNA from plant. 

cDNA can be synthesized, for example, by using a poly T sequence which is complementary to poly A nucleotide 
existing at the terminal of mRNA as a primer to synthesize complementary DNA by the aid of reverse transcriptase, 
and forming a double strand by the aid of DNA polymerase. , 

The method therefor is described, for example, in Molecular Cloning (Maniatis et al., Cold Spring Harbour Labo- 
ratory). However a variety of cDNA synthesis kits are commercially available from various companies, which may be 
used. 

Generally the library is constructed by using a phage vector A variety of commercially available vectors are usable. 
However, it is preferable to use a vector, for example, >^ZAP vector in which it is unnecessary to perform recloning from 
the vector, and it is possible to immediately prepare a plasmid for sequencing. 

<2> Determination of nucleotide sequence of cDNA 

Clones are randomly selected from the obtained cDNA library to determine nucleotide sequences of inserts in the 
clones. The nucleotide sequence can be determined in accordance with the Maxam-Gilbert method or the dideoxy 
method. Among them, the dideoxy method is more convenient and preferred. 

The nucleotide sequence can be determined in accordance with the dideoxy method by using a commercially 
available sequencing kit. Further, the use of an automatic sequencer makes it possible to determine sequences of a 
large number of clones for a short period of time. 

It is unnecessary to determine the sequence for an entire length of the insert. It is enough to determine a length 
of nucleotide sequence which is considered to be sufficient to perform homology search. For example, in Examples 
described later on, the homology search as described below was performed when a sequence having not less than 
60 nucleotides was successfully determined. 



EP 0 875 575 A2 



<3> Homology search with gene data base 



The determined nucleotide sequence of each of cDNA clones is used to perform the homology search with respect 
to known amino acid sequences of the cellulose synthase or nucleotide sequences of genes coding therefor registered 
in the gene data base. The cellulose synthase is exemplified by an enzyme encoded by a gene of cellulose 4-p- 
glucosyltransferase (BcsA) of cellulose synthase operon of acetobacterium (Wong, H. C. et at.. Proc. Nail. Acad. Sci. 
U.S.A. , 87, 8130-8134 (1990). ACCESSION No. M37202). 

Those usable as the data base include, for example, GenBank, EMBL, and DDBJ published, for example, from 
Los Alamos National Institute in the United States, Institute of European Molecular Biology, and National Institute of 
Genetics (Japan). Those commercially available and useable as the program for homology search include, for example, 
commercially available DNA analysis softwares, such as DNASIS (Hitachi Software Engineering Co.Ltd.) and GENE- 

3d 




nected on Internet with NCBI (National Center for Biotechnology Information) to utilize mttp:/ 
BLAST/) BLAST (Basic Local Alignment Search Tool) so that high speed homology search is performed. 

The homology search is performed, for example, in accordance with the following algorithm. When the homology 
search is performed for a nucleotide sequence, homology comparison is advanced while shifting the nucleotide se- 
quence to be investigated by every one nucleotide with respect to individual gene sequences included in the data base. 
When six or more continuous nucleotides are coincident, the homology score is counted and calculated in accordance 
with a homology score table (see ; for example, M. Dayhoff, Atlas of Protein Sequence and Structure, vol. 5 (1978)). 
The system is set so that those having a score not less than a certain value are picked up as candidates which have 
homology. Further, the gap may be introduced into the sequence to be investigated or into the gene sequence included 
in the data base to make optimization so that the score is maximized. 

When the homology search is performed for an amino acid sequence, a nucleotide sequence to be investigated 
is converted into amino acids concerning all six frames including those of a complementary chain. The investigation 
may be performed in the same manner as performed for the nucleotide. Specifically, it is possible to use blastx of 
BLAST described above. As for detailed techniques and conditions for the search, reference may be made to DDBJ 
News Letter, No. 15 (February 1995). 



<4> Isolation of cDNA clone of cotton plant cellulose synthase 



The clone obtained as described above is not necessarily contain the entire nucleotide sequence of the gene. In 
such a case, the clone is used as a probe to perform screening by means of plaque hybridization. Thus, it is possible 
to obtain a clone containing a full length gene from the library. A specified method may be carried out with reference 
to Molecular Cloning, second edition (Mantatis et al., Cold Spring Harbour Laboratory) 12.30 to 12.40. 

When obtained cDNA is deficient in 5'-portion, the 5'-portion can be obtained as well by synthesizing primers so 
that the cDNA sequence may be elongated toward the S'-terminal, and performing RT-PCR by using mRNA as a 
template. 

As demonstrated in Examples described later on, the DNA of the present invention has been obtained as those 
having homology to the known bacterial cellulose synthase gene. The DNA further codes for an amino acid sequence 
GlnXXXXXXArgTrp (SEQID NO: 12) which is considered to form a UDP-glucose binding domain, having high homology 
in the vicinity thereof. 

The nucleotide sequences of DNA of the present invention obtained as described above and the amino acid se- 
quences deduced from the nucleotide sequences are shown in SEQ ID NOs: 1 to 10 in Sequence Listing. SEQ ID 
NOs: 1 and 3 show nucleotide sequences of PcsAI and PcsA2 respectively. SEQ ID NOs: 2 and 4 show amino acid 
sequences deduced from the nucleotide sequences of PcsAI and PcsA2 respectively. 

SEQ ID NOs: 5 and 6 show a nucleotide sequence of a clone (PcsA3-682) containing 3'-side region of PcsA3 and 
an amino acid sequence deduced from the nucleotide sequence respectively. SEQ ID NOs: 7 and 8 show a nucleotide 
sequence of a 5'-portion (PcsA3-5') of another clone containing S'-side region of PcsA3 and an amino acid sequence 
deduced from the nucleotide sequence respectively. SEQ ID NOs: 9 and 10 show a nucleotide sequence of 3'-portion 
(PcsA3-3') of the clone and an amino acid sequence deduced from the nucleotide sequence respectively (see Fig. 1). 
That is, SEQ ID NO: 5 corresponds to the 3'-side region of PcsA3, SEQ ID NO: 7 corresponds to the S'-side region of 
PcsA3, and SEQ ID NO: 9 corresponds to internal region of PcsA3. The overlapping portion of PcsA3-682 is different 
from that of PcsA3-3' in 9 nucleotides in the nucleotide sequence and 1 amino acid in the amino acid sequence. Figs. 
3 and 4 show the comparison between the nucleotide sequences of PcsA3-682 and PcsA3-3\ SEQ ID NO: 11 shows 
a combination of the amino acid sequences encoded by PcsA3-682 and PcsA3-3'. 

The sequence of GlnXXXXXXArgTrp (SEQ ID NO: 12) corresponds to amino acid numbers 710 to 714 in SEQ ID 
NO: 2 for PcsAI , amino acid numbers 778 to 782 in SEQ ID NO: 4 for PcsA2, and amino acid numbers 356 to 360 in 
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530 535 54Q 

His Asp Arg Tyr Ala Asn Arg Asn Val Val Phe Phe Asp lie Asn Met 

550 555 
I^u Gly Leu Asp Gly Leu Gin Gly Pro Val Tyr Val Gly Thr Gly Cy^ 

565 570 575 

Val Phe Asn Arg Gin Ala Leu Tyr Gly Tyr Asp Pro Pro Val Ser Glu 
580 585 590 

5 600 605 

Cys Cys Gly Gly Ser Arg Lys Lys Ser Lys Lys Lys Gly Glu Lys Lys 

oi 615 620 

Gly Leu Leu Gly Gly L*u Leu Tyr Gly Lys Lys Lys Lys Met Met Gly 

Lys Asn Tyr Val Lys Lys Gly Ser Ala Pro Vafphe Asp Leu Glu Gl^ 

645 650 
He Glu Glu Gly Leu Glu Gly Tyr Glu Glu Leu Glu Lys Ser t£ Leu 

° 665 670 

Met Ser Gin Lys Asn Phe Glu Lys Arg Phe Gly Gin Ser Pro Val Phe 

5 680 685 

He Ala ser Thr Leu Met Glu Asn Gly Gly Leu Pro Glu Gly Tnr Asn 

695 700 
Ser Thr Ser Leu lie Lys Glu Ala tie Ms Val He Ser Cys Gly Tyr 

710 715 
Glu Glu Lys Thr Glu Trp Gly L ys Glu He Gly Trp He Tyr Gly SeT 

725 730 7 oc 

Val Thr Glu Asp He Leu Thr Gly Phe Lys His Oys Arg Gly Trp 

745 750 

Lys ser val xyr cys val Pro Lys Arg Pro Ala Phe Lys Gly Ser Ala 

5 7 60 765 

Pro lie Asn Leu Ser Asp Arg leu His Gin Val Leu Arg Trp Ala Leu 

775 780 
Gly Ser Val Glu He Leu Ser Arg His Cy^Pro Leu Trp Tyr G ly 

Tyr Gly Gly Lys Leu Lys Trp Leu Glu Arg Le^Ala Tyr He Asn Th^ 

805 q^q 

He Val Tyr Pro Phe Thr Ser Ilo r>^ T _ T 815 

QOO i^er lie Pro teu Leu Ala Tyr Cys Thr He 

ozo 830 
Pro Ala Val Cys Leu Leu Thr Gly Lys Phe lie He Pro Thr Leu Ser 

840 845 

Asn Leu Thr ser val Trp Phe Leu Ala Leu F^e Leu Ser lie Ile M a 

855 860 



29 



EP 0 875 575 A2 



Thr Gly val Leu Glu Leu Arg Trp Ser Gly Val Ser He Gin Asp Trp 
865 870 875 880 

Trp Arg Asn Glu Gin Phe Trp Val lie Gly Gly Val Ser Ala His Leu 

885 890 895 

Phe Ala Val Phe Gin Gly Leu Leu Lys Val Leu Ala Gly Val Asp Thr 

900 905 910 

Asn phe Tnr Val Thr Ala Lys Ala Ala Asp Asp Thr Glu Phe Gly Glu 
915 920 925 




He He Leu Asn Met Val Gly Val Val Ala Gly Val Ser Asp Ala He 
945 950 955 960 

Asn Asn Gly Tyr Gly Ser Trp Gly Pro Leu Phe Gly Lys Leu Phe Phe 

965 970 975 

Ala Phe Trp Val He Leu His Leu Tyr Pro Phe Leu Lys Gly Leu Met 

980 985 990 

Gly Arg Gin Asn Arg Thr Pro Thr He Val Val Leu Trp Ser He Leu 

995 1000 1005 

Leu Ala Ser lie Phe Ser lieu Val Trp Val Arg lie Asp Pro Phe Leu 

1010 1015 1020 

Pro Lys Gin Thr Gly Pro Val Leu Lys Gin Cys Gly Val Glu Cys 
1025 1030 1035 

(2) INFORMATION FOR SBQ ID NO: 5: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2033 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA to mRNA 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Gossypiura hirsutum L. 

(C) INDIVIDUAL ISOLATE: Coker312 
(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION :1.. 1857 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CCG ACA TTC GTG AAG GAG CGT CGA GCT ATG AAG AGA GAA TAT GAA GAA 
Pro Thr Phe Val Lys Glu Arg Arg Ala Met Lys Arg Glu Tyr Glu Glu 

1 5 10 I 5 

TTC AAG GTT AGG ATA AAT GCA CTT GTA GCC AAA GCC CAA AAG GTT OCT 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



Phe Lys Val Arg lie Asn Ala Leu Val Ala Lys Ala Gin Lys Val Pro 

20 25 30 

CCA GAA GGG TGG ATC ATG CAA GAT GGG ACA OCA TGG OCA GGA AAC AAT 144 
Pro Glu Gly Trp lie Met Gin Asp Gly Thr Pro Trp Pro Gly Asn Asn 

35 40 45 

ACT AAA GAT CAC OCT GGT ATG ATT CAA GTA TTT CTC GGT CAA ACT GGA 192 
Thr Lys Asp His Pro Gly Met lie Gin Val Phe Leu Gly Gin Ser Gly 

50 55 60 

GGC CAT GAT ACC GAA GGA AAT GAG CTT OCT CGT CTC GTC TAT GTA TCT 240 
Gly His Asp Thr Glu Gly Asn Glu Leu Pro Arg Leu Val Tyr Val Ser 

65 70 75 BO 

CGA GAG AAA AGG OCT GGT TIC TTG CAT CAC AAG AAA GCT GOT GCC ATG 288 
Arg Glu Lys Arg Pro Gly Phe Leu His His Lys Lys Ala Gly Ala Met 

85 90 95 

AAC GOC CTT GTT CGG GTC TOG GGG GTG CTC ACA AAT GCT OCT TTT ATG 336 
Asn Ala Leu Val Arg Val Ser Gly Val Leu Thr Asn Ala Pro Phe Met 

100 105 110 

TTG AAC TTG GAT TGT GAC CAT TAT TTA AAT AAC AGC AAG GCT OTA AGA 384 
Leu Asn Leu Asp Cys Asp His Tyr Leu Asn Asn Ser Lys Ala Val Arg 

115 120 125 

GAG GCT ATG TGT TTC TTG ATG GAC OCT CAA ATT GGA AGA AAG GTT TGC 432 
Glu Ala Met Cys Phe Leu Met Asp Pro Gin lie Gly Arg Lys Val Cys 

130 135 140 

TAT GTC CAA TTC CCT CAA CGT TTC GAT GGT ATT GAT AGA CAT GAT OGA 480 
Tyr Val Gin Phe Pro Gin Arg Phe Asp Gly lie Asp Arg His Asp Arg 
145 150 155 160 

TAT GOC AAT OGG AAC ACA GTT TTC TTT GAT ATT AAC ATG AAA GCT CTA 528 
Tyr Ala Asn Arg Asn Thr Val Phe Phe Asp lie Asn Met Lys Gly Leu 

165 170 175 

GAT GGT ATA CAA GGC CCT GTA TAT GTC GGC ACG GGG TGT GIT TTC AGA 576 
Asp Gly He Gin Gly Pro Val Tyr Val Gly Thr Gly Cys Val Phe Arg 

180 185 190 

AGG CAA GCT CTT TAT GGT TAT GAA CCT CCA AAG GGA CCT AAG OGC COG 624 
Arg Gin Ala Leu Tyr Gly Tyr Glu Pro Pro Lys Gly Pro Lys Arg Pro 

195 200 205 

AAA ATG GTA ACC TGT GGT TGC TGC OCT TGT TTT GGA OGC OGC AGA AAG 672 
Lys Met Val Thr Cys Gly Cys Cys Pro Cys Hie Gly Arg Arg Arg Lys 

210 215 220 

GAC AAA AAG CAC TCT AAG GAT GCT GGA AAT GCA AAT GGT CTA AGC CTA 720 
Asp Lys Lys His Ser Lys Asp Gly Gly Asn Ala Asn Gly Leu Ser Leu 
225 230 235 240 
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GAA OCA GCC AAA GAT GAC AAG GAG TTA TTG ATG TOC CAC ATG AAC TTT 768 
Glu Ala Ala Lys Asp Asp Lys Glu Leu Leu Met Ser His Met Asn Phe 

245 250 255 

GAA AAG AAA TTT GGA CAA TCA GCC ATT TTT GTA ACT TCA ACA CTG ATG 816 
Glu Lys Lys Phe Gly Gin Ser Ala lie Phe Val Thr Ser Thr Leu Met 

260 265 270 

GAA CAA GCT GGT GTC OCT OCT TCT TCA AGC OOC GCA GCT TTG CTC AAA 864 
Glu Gin Gly Gly Val Pro Pro Ser Ser Ser Pro Ala Ala Leu Leu Lys 
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Glu Ala lie His Val lie Ser Cys Gly Tyr Glu Asp Lys Thr Glu Trp 

290 295 300 

GGA AGC GAG CTT GGC TGG ATT TAG GGC TOG ATT ACA GAA GAT ATC TTA 960 
Gly Ser Glu Leu Gly Trp lie Tyr Gly Ser lie Thr Glu Asp lie Leu 
305 310 315 320 

ACA GGA TTC AAG ATG CAT TGC OCT GGA TGG AGA TCA ATA TAG TGC ATG 1008 
Thr Gly Phe Lys Met His Cys Arg Gly Trp Arg Ser lie Tyr Cys Met 

325 330 335 

OCA AAG TTG OCT GCA TTC AAG GGT TCA GCT OCC ATC AAT CTA TOG GAT 1056 
Pro Lys Leu Pro Ala Phe Lys Gly Ser Ala Pro lie Asn Leu Ser Asp 

340 345 350 

COT CTA AAC CAA CTC CTT GGA TGG GCA CTC GGT TCT CTT GAA ATT TTC 1104 
Arg Leu Asn Gin Val Leu Arg Trp Ala Leu Gly Ser Val Glu lie Phe 

355 360 365 

TTT ACT CAT CAT TGC CCA GCA TGG TAT GGT TTC AAG GGA GGA AAG CTA 1152 
Phe Ser His His Cys Pro Ala Trp Tyr Gly Phe Lys Gly Gly Lys Leu 

370 375 380 

AAA TGG CTT GAA OGA TTC GCA TAT GTC AAC ACA ACC ATC TAC OOC TTC 1200 
Lys Trp Leu Glu Arg Phe Ala Tyr Val Asn Thr Thr lie Tyr Pro Fhe 
385 390 395 • 400 

ACA TCT TTA OCA CTT CTC GCC TAT TCT ACC CTA 00G GCA ATC TGT TTA 1248 
Thr Ser Leu Pro Leu Leu Ala Tyr Cys Thr Leu Pro Ala He Cys Leu 

405 410 415 

CTT ACC GAT AAA TTT ATC ATG CCA COG ATA AGC ACC TTT GCA ACT CTA 1296 
Leu Thr Asp Lys Phe He Met Pro Pro He Ser Thr Phe Ala Ser Leu 

420 425 430 

TTC TTC ATT GCC TTG TTT CTT TCA ATC TTT GCA ACT GGT ATT CTC GAG 1344 
Phe Phe He Ala Leu Phe Leu Ser He Phe Ala Thr Gly He Leu Glu 

435 440 445 

CTA AGG TGG ACT GGA CTA AGC ATT GAA GAA TGG TGG AGG AAT GAG CAA 1392 
Leu Arg Trp Ser Gly Val Ser He Glu Glu Trp Trp Arg Asn Glu Gin 
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10 



15 



20 



450 455 460 

TTT TOG GTC ATC GGT GGC ATT TCG GCA CAT TTG TIC GCT GTT ATC CAA 1440 
Phe Trp Val He Gly Gly He Ser Ala His Leu Phe Ala Val He Gin 
465 470 475 480 

GGC TTG TTG AAA GTT CTA GCT GGT ATT GAC ACT AAT TTC ACT GTC ACA 1488 
Gly Leu Leu Lys Val Leu Ala Gly He Asp Itir Asn Phe Thr Val Thr 

485 490 495 

TOC AAG GCA ACT GAT GAC GAG GAG TTC GGG GAA TTG TAT ACT TTC AAA 1536 
Ser Lys Ala Thr Asp Asp Glu Glu Phe Gly Glu Leu Tyr Thr Phe Lys 

500 505 510 

TOG ACA ACC CTT CTA ATT OCT OCT ACT ACC GTC TTA ATC ATC AAT TTA 1584 
Trp Thr Thr Leu Leu He Pro Pro Thr Thr Val Leu He He Asn Leu 

515 520 525 

CTC GCT GTC GTT GCA GGC ATC TCG GAT GCC ATA AAC AAT GGA TAC CAA 1632 
Val Gly Val Val Ala Gly He Ser Asp Ala He Asn Asn Gly Tyr Gin 

530 535 540 

TCA TGG GGA OCT CTT TTT GGG AAG CTC TTC TTC TCT TTC TGG GTG ATT 1680 
Ser Trp Gly Pro Leu Phe Gly Lys Leu Phe Phe Ser Phe Trp Val He 
545 550 555 560 

25 GTC CAT CTC TAT CCA TTC CTC AAA GGT TTA ATG GGG AGA CAA AAC CGG 1728 

Val His Leu Tyr Pro Phe Leu Lys Gly Leu Met Gly Arg Gin Asn Aug 

565 570 575 

ACA CCA ACC ATT GTT GTT ATA TGG TCA GTG CTA TTG GCT TCA ATC TTC 1776 
30 Thr Pro Thr He val Val He Trp Ser Val Leu Leu Ala Ser He Phe 

580 585 590 

TOC TTG CTT TGG GTC CGA ATT GAT CCA TTT GTG ATG AAA ACC AAA GGA 1824 
Ser Leu Leu Trp Val Arg He Asp Pro Phe Val Met Lys Thr Lys Gly 
35 595 600 605 

OCA GAC ACT ACA ATG TCT GGC ATT AAC TGT TGAAAAAAAA TCATCTTGOG 1874 
Pro Asp Thr TTur Met Cys Gly He Asn Cys 
610 615 
40 TGCTTCTTTT AGATTATGGT ATGTGATGTA TGAACAAACA AGAATGGAGA TGCACAAGAC 1934 

AGAATAAAAT TAGAGTGAAA GTTTTGTGTA GTTATATATT CATTCTACCA ACTATAAGTT 1994 
TTGTCATTCA ATTGAAAATA GCTCAACTTT GTGATCAAA 2033 

45 (2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 618 amiro acids 

(B) TYPE: amino acid 
so (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
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(v) FRAGMENT TYPE: C-terminal fragment 
(xl) SEQUENCE DESCRIPTION : SEQ ID NO: 6: 
Pro Thr Phe Val Lys Glu Arg Arg Ala Met Lys Arg Glu Tyr Glu Glu 

15 10 15 

Phe Lys Val Arg lie Asn Ala Leu Val Ala Lys Ala Gin Lys Val Pro 

20 25 30 

Pro Glu Gly Trp lie Met Gin Asp Gly Thr Pro Trp Pro Gly Asn Asn 
35 40 45 
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Gly His Asp Thr Glu Gly Asn Glu Leu Pro Arg Leu Val Tyr Val Ser 

65 70 75 80 

Arg Glu Lys Arg Pro Gly Phe Leu His His Lys Lys Ala Gly Ala Met 

85 90 95 

Asn Ala Lai Val Arg Val Ser Gly Val Leu Thr Asn Ala Pro Phe Met 

100 105 HO 

Leu Asn Lai Asp Cys Asp His Tyr Leu Asn Asn Ser Lys Ala Val Arg 

115 120 125 

Glu Ala Met Cys Phe Lai Met Asp Pro Gin He Gly Arg Lys Val Cys 

130 135 140 

Tyr Val Gin Phe Pro Gin Arg Phe Asp Gly He Asp Arg His Asp Arg 
145 150 155 160 

Tyr Ala Asn Arg Asn Thr Val Phe Phe Asp lie Asn Met Lys Gly leu 

165 170 175 

Asp Gly He Gin Gly Pro Val Tyr Val Gly Thr Gly Cys Val Phe Arg 

180 185 190 

Arg Gin Ala Leu Tyr Gly Tyr Glu Pro Pro Lys Gly Pro Lys Arg Pro 

195 200 205 

Lys Met Val Thr Cys Gly Cys Cys Pro Cys Phe Gly Arg Arg Arg Lys 

210 215 220 

Asp Lys Lys His Ser Lys Asp Gly Gly Asn Ala Asn Gly Leu Ser Leu 
225 230 235 240 

Glu Ala Ala Lys Asp Asp Lys Glu Leu Leu Met Ser His Met Asn Phe 

245 250 255 

Glu Lys Lys Phe Gly Gin Ser Ala He Phe Val Thr Ser Thr leu Met 

260 265 270 

Glu Gin Gly Gly Val Pro Pro Ser Ser Ser Pro Ala Ala Leu Leu Lys 

275 280 285 

Glu Ala He His Val He Ser Cys Gly Tyr Glu Asp Lys Thr Glu Trp 

290 295 300 

Gly Ser Glu Leu Gly Trp II Tyr Gly Ser He Thr Glu Asp He Leu 
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305 310 315 320 

Thr Gly Phe Lys Met His Cys Arg Gly Trp Arg Ser He Tyr Cys Met 

325 330 335 

Pro Lys Leu Pro Ala Phe Lys Gly Ser Ala Pro He Asn Leu Ser Asp 
340 345 350 

Arg Leu Asn Gin Val Leu Arg Trp Ala Leu Gly Ser Val Glu lie Phe 
355 360 365 

Phe Ser His His Cys Pro Ala Trp Tyr Gly Phe Lys Gly Gly Lys Leu 

370 375 3 8 o 

Lys Trp Leu Glu Arg Phe Ala Tyr Val Asn Thr Thr He Tyr Pro Phe 

385 390 395 400 

Thr Ser Leu Pro Leu Leu Ala Tyr Cys Thr Leu Pro Ala He Cys Leu 

405 410 415 

Leu Thr Asp Lys Phe He Met Pro Pro He Ser Thr Phe Ala Ser Leu 
420 425 430 

Phe Phe He Ala Leu Phe Leu Ser He Phe Ala Thr Gly He Leu Glu 
435 440 445 

Leu Arg Trp Ser Gly Val Ser He Glu Glu Trp Trp Arg Asn Glu Gin 

450 455 460 

Phe Trp val He Gly Gly He Ser Ala His Leu Phe Ala Val He Gin 
465 470 475 480 

Gly Leu Leu Lys Val Leu Ala Gly He Asp Thr Asn Phe Thr Val Thr 

485 490 495 

Ser Lys Ala Thr Asp Asp Glu Glu Phe Gly Glu Leu Tyr Thr Phe Lys 

500 505 510 

Trp Thr Thr Leu Leu He Pro Pro Thr Thr Val Leu He He Asn Leu 

515 520 525 

Val Gly Val Val Ala Gly He Ser Asp Ala lie Asn Asn Gly Tyr Gin 
530 535 ^ 

Ser Trp Gly Pro Leu Phe Gly Lys Leu Phe Phe Ser Phe Trp Val He 

54 f 550 555 560 

Val His Leu Tyr Pro Phe Leu Lys Gly Leu Met Gly Arg Gin Asn Arg 
565 570 575 

Thr Pro Thr He Val Val He Trp Ser Val Leu Leu Ala Ser He Phe 
580 5 8 5 590 

Ser Leu Leu Trp Val Arg He Asp Pro Phe Val Met Lys Thr Lys Gly 

595 600 6 05 

Pro Asp Thr Thr Met Cys Gly He Asn Cys 
610 615 

(2) INFORMATION FOR SEQ ID NO: 7: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENCTH: 1086 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(11) MOLECULE TYPE: CDNA to mRNA 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Gossypiura hirsutum L. 




(A) NAME/KEY: CDS 

(B) LOCATION: 24. .1086 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
GGCACGAGCT TTCATATCCT CCA ATG GAA GCC AGC GCC GGA CTC GTT GCG 50 

Met Glu Ala Ser Ala Gly Leu Val Ala 
1 5 

GGC TCT CAC AAC CGC AAT GAA CTT GTT GTC ATT CAT GGC CAT GAA GAG 98 
Gly Ser His Asn Arg Asn Glu Leu Val Val He His Gly His Glu Glu 

10 15 20 25 

OCT AAA OCT CTG AAG AAC TTG GAT GGT CAA GTT TGT GAG ATT TGT GGT 146 
Pro Lys Pro Leu Lys Asn Leu Asp Gly Gin Val Cys Glu He Cys Gly 

30 35 40 

GAT GAA ATT GGG TTG AOG CTC GAT GGA GAT CTT TTC GTG GCC TGC AAC 194 
Asp Glu He Gly Leu Thr Val Asp Gly Asp Leu Phe Val Ala Cys Asn 

45 50 55 

GAG TCT GGT TIT CCA GTT TGT AGG CCT TGT TAT GAG TAT GAA AGG AGA 242 
Glu Cys Gly Phe Pro Val Cys Arg Pro Cys Tyr Glu Tyr Glu Arg Arg 

60 65 70 

GAA GGG ACT CAA CAA TGT CCT CAA TGC AAA ACT AGA TAC AAG CGT CTC 290 
Glu Gly Ser Gin Gin Cys Pro Gin Cys Lys Thr Arg Tyr Lys Arg teu 

75 80 85 

AAG GGG ACT COG AGG GTG GAG GGA GAT GAA GAT GAA GAG GAT GTG GAT 338 
Lys Gly Ser Pro Arg Val Glu Gly Asp Glu Asp Glu Glu Asp Val Asp 

90 95 100 105 

GAT ATC GAA CAT GAA TTC AAC ATT GAT GAT GAA CAA AAC AAG TAT AGA 386 
Asp He Glu His Glu Phe Asn He Asp Asp Glu Gin Asn Lys Tyr Arg 

110 115 120 

AAT ATC GCT GAA TOG ATG CTT CAT GGA AAG ATG AGC TAC GGG AGA GGC 434 
Asn He Ala Glu Ser Met Leu His Gly Lys Met Ser Tyr Gly Arg Gly 

125 130 135 

OCT GAA GAC GAT GAA GCT TTG CAA ATC CCA CCC GCT TTA GCT GOT GTT 482 
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Pro Glu Asp Asp Glu Gly Leu Gin He Pro Pro Gly Leu Ala Gly Val 

140 145 150 

CGA TCT COG COG CTG AGO GGG GAG TTC CCA ATA GGA AQC TCT CTT GCT 530 
Arg Ser Arg Pro Val Ser Gly Glu Phe Pro He Gly Ser Ser Leu Ala 

155 160 165 

TAT GGG GAA CAC ATG TCA AAT AAA CGA GTT CAT CCA TAT OCT ATG TCT 578 
Tyr Gly Glu His Met Ser Asn Lys Arg Val His Pro Tyr Pro Met Ser 
170 175 180 185 

GAA OCT GGA AGT GCA AGA TGG GAT GAA AAG AAA GAG GGA GGA TOG AGA 626 
Glu Pro Gly Ser Ala Arg Trp Asp Glu Lys Lys Glu Gly Gly Trp Arg 
190 195 200 

15 GAA AGG ATG GAT GAT TGG AAA ATG CAG CAA GGG AAT TTG GCT CCT GAA 674 

Glu Arg Met Asp Asp Trp Lys Met Gin Gin Gly Asn Leu Gly Pro Glu 

205 210 215 

CCT GAT GAT GCC TAT GAT GCT GAC ATG GCT ATG CTT GAT GAA GCT AGG 722 
20 pro Asp Asp Ala Tyr Asp Ala Asp Met Ala Met Leu Asp Glu Ala Arg 

220 225 230 

CAG OCA TTG TCA AGG AAA GTG OCA ATT GCA TOG AGC AAA ATC AAT OCT 770 
Gin Pro Leu Ser Arg Lys Val Pro He Ala Ser Ser Lys He Asn Pro 
25 235 240 245 

TAT OGT ATG GTG ATT GTG GCT OCT CTA GIT ATC CTT GCT TTC TTT CTT 818 
Tyr Arg Met Val He Val Ala Arg Leu Val He Leu Ala Phe Phe Leu 
250 255 260 265 

30 0GC TAT GGG ATT TTG AAC OCG CTA CAT GAT GCA ATT GGG CTT TGG CTA 866 

Arg Tyr Arg He Leu Asn Pro Val His Asp Ala He Gly Leu Trp Leu 

270 275 280 

ACT TCT GTG ATC TCT GAA ATC TGG TTT GCC TTT TCA TGG ATC CTT GAT 914 
Thr Ser Val He Cys Glu He Trp Phe Ala Phe Ser Trp He Leu Asp 

285 290 295 

CAG TTC OCT AAA TGG TTC CCT ATT GAC OGC GAG ACG TAT CTC GAT OGC 962 
Gin Phe Pro Lys Trp Phe Pro He Asp Arg Glu Thr Tyr Leu Asp Arg 

300 305 310 

CTT TCC CTC AGG TAT GAG AGG GAA GGT GAG CCC AAC ATG CTT GCT TCT 1010 
Leu Ser Leu Arg Tyr Glu Arg Glu Gly Glu Pro Asn Met Leu Ala Ser 

315 320 325 

GTT GAT ATT TTT GTC AGT ACA GTG GAT OCA TTG AAG GGA CCT OCT CTA 1058 
Val Asp He Phe Val Ser Thr Val Asp Pro Leu Lys Gly Pro Pro Leu 
330 335 340 345 

CTA ACA GOG AAT ACA GTT CTA TOG ATC T 1086 
Val Thr Ala Asn Thr Val Leu Ser He 
350 
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(2) INFORMATION FOR SEQ ID NO: 8: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 354 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: N- terminal fragment 




15 10 15 

Leu Val Val lie His Gly His Glu Glu Pro Lys Pro Leu Lys Asn Leu 

20 25 30 

Asp Gly Gin Val Cys Glu lie Cys Gly Asp Glu lie Gly Leu Thr Val 

35 40 45 

Asp Gly Asp Leu Phe Val Ala Cys Asn Glu Cys Gly Phe Pro Val Cys 

50 55 60 

Arg Pro Cys Tyr Glu Tyr Glu Arg Arg Glu Gly Ser Gin Gin Cys Pro 

65 70 75 SO 

Gin Cys Lys Thr Arg Tyr Lys Arg Leu Lys Gly Ser Pro Arg Val Glu 

85 90 95 

Gly Asp Glu Asp Glu Glu Asp Val Asp Asp lie Glu His Glu Phe Asn 

100 105 110 

lie Asp Asp Glu Gin Asn Lys Tyr Arg Asn lie Ala Glu Ser Met Leu 

115 120 125 

His Gly Lys Met Ser Tyr Gly Arg Gly Pro Glu Asp Asp Glu Gly Lai 

130 135 140 

Gin lie Pro Pro Gly Leu Ala Gly Val Arg Ser Arg Pro Val Ser Gly 
145 150 155 160 

Glu Phe Pro lie Gly Ser Ser Leu Ala Tyr Gly Glu His Met Ser Asn 

165 170 175 

Lys Arg Val His Pro Tyr Pro Met Ser Glu Pro Gly Ser Ala Arg Trp 

180 185 190 

Asp Glu Lys Lys Glu Gly Gly Trp Arg Glu Arg Met Asp Asp Trp Lys 

195 200 205 

Met Gin Gin Gly Asn Leu Gly Pro Glu Pro Asp Asp Ala Tyr Asp Ala 

210 215 220 

Asp Met Ala Met Leu Asp Glu Ala Arg Gin Pro Leu Ser Arg Lys Val 
225 230 235 240 

Pro lie Ala Ser Ser Lys lie Asn Pro Tyr Arg Met Val He Val Ala 

245 250 255 
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Arg Leu Val He Leu Ala Phe Phe Leu Arg Tyr Arg He Leu Asn Pro 

260 265 270 

Val His Asp Ala He Gly Leu Trp Leu Thr Ser Val He Cys Glu He 

275 280 285 

Trp Phe Ala Phe Ser Trp He Leu Asp Gin Phe Pro Lys Trp Phe Pro 

290 295 300 

He Asp Arg Glu Thr Tyr Leu Asp Arg Leu Ser Leu Arg Tyr Glu Arg 
305 310 315 320 

Glu Gly Glu Pro Asn Met: Leu Ala Ser Val Asp He Phe Val Ser Thr 

325 330 335 

Val Asp Pro I*su Lys Gly Pro Pro Leu Val Thr Ala Asn Thr Val Leu 
340 345 350 

Ser He 
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20 (2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1000 base pairs 

(B) TYPE: nucleic acid 
25 (C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: cDNA to mRNA 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Gossypium hirsutum L. 

(C) INDIVIDUAL ISOLATE: Coker312 
(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 1 . . 1000 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

GAC AAA GTC CGG COG ACA TTC GTG AAG GAG CGT CGA GCT ATG AAG AGA 48 
Asp Lys Val Arg Pro Thr Phe Val Lys Glu Arg Arg Ala Met Lys Arg 

1 5 10 15 

GAA TAT GAA GAA TTC AAG GTT AGG ATA AAT GCA CTT GTA GCC AAA GCC 96 
Glu Tyr Glu Glu Phe Lys Val Arg He Asn Ala Leu Val Ala Lys Ala 

20 25 30 

CAA AAG GTT CCT OCA GAA GOG TGG ATC ATG CAA GAT GGG ACA CCA TCG 144 
Gin Lys Val Pro Pro Glu Gly Trp He Met Gin Asp Gly Thr Pro Trp 

35 40 45 

CCA GGA AAC AAT ACT AAA GAT CAC OCT GCT ATG ATT CAA GTA TTT CTC 192 
Pro Gly Asn Asn Thr Lys Asp His Pro Gly Met He Gin Val Phe Leu 
50 50 55 60 
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GCT CAA ACT GGA GGC CAT GAT AOC GAA GGA AAT GAG CTT OCT OCT CTC 240 
Gly Gin Ser Gly Gly His Asp Thr Glu Gly Asn Glu Leu Pro Arg Leu 

65 70 75 80 

OTC TAT GTA TCT OGA GAG AAA AGG OCA GGT TIC TTG CAT CAC AAG AAA 288 
Val Tyr Val Ser Arg Glu Lys Arg Pro Gly Phe lieu His His Lys Lys 

85 90 95 

GCT GOT GOC ATG AAC GOC CTT GTT CGT CTC TOG GGG GTG CTT ACA AAT 336 
Ala Gly Ala Met Asn Ala Leu Val Arg Val Ser Gly Val Leu Thr Asn 
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Ala Pro Phe Met Leu Asn Leu Asp Cys Asp His Tyr Leu Asn Asn Ser 

115 120 125 

AAG GCT GTA AGA GAG GCT ATG TCT TTC TTG ATG GAC CCT CAA ATT GGA 432 
Lys Ala Val Arg Glu Ala Met Cys Phe Leu Met Asp Pro Gin lie Gly 

130 135 140 

AGA AAG GTT TGC TAT GTC CAA TTC OCT CAA CGT TTC GAT GGT ATT GAT 480 
Arg Lys Val Cys Tyr Val Gin Phe Pro Gin Arg Phe Asp Gly lie Asp 
145 150 155 160 

AGA CAT GAT OGA TAT GOC AAT COG AAC ACA GOT TTC TIT GAT ATT AAC 528 
Arg His Asp Arg Tyr Ala Asn Arg Asn Thr Val Phe Phe Asp lie Asn 

165 170 175 

ATG AAA GGT CTA GAT GCT ATA CAA GGC CCT GTA TAT GTC GGC AGG GGG 576 
Met Lys Gly Leu Asp Gly lie Gin Gly Pro Val Tyr Val Gly Thr Gly 

180 185 190 

TCT GTT TTC AGA AGG CAA GCT CTT TAT GOT TAT GAA OCT OCA AAG GGA 624 
Cys Val Phe Arg Arg Gin Ala Leu Tyr Gly Tyr Glu Pro Pro Lys Gly 

195 200 205 

OCT AAG OGC OCG AAA ATG CTA ACC TCT GCT TGC TGC CCT TGC ITT GGA 672 
Pro Lys Arg Pro Lys Met Val Thr Cys Gly Cys Cys Pro Cys Phe Gly 

210 215 220 

OGC CGC AGA AAG GAC AAA AAG CAC TCT AAG GAT GOT GGA AAT GCA AAT 720 
Arg Arg Arg Lys Asp Lys Lys His Ser Lys Asp Gly Gly Asn Ala Asn 
225 230 235 240 

GOT CTA AGC CTA GAA GCA GOC GAA GAT GAC AAG GAG TTA TTG ATG TOC 768 
Gly Leu Ser Leu Glu Ala Ala Glu Asp Asp Lys Glu Leu Leu Met Ser 

245 250 255 

CAC ATG AAC TTT GAA AAG AAA TTT GGA CAA TCA GOC ATT TTT GTA ACT 816 
His Met Asn Phe Glu Lys Lys Phe Gly Gin Ser Ala lie Phe Val Thr 

260 265 270 

TCA ACA CTG ATG GAA CAA GGT GCT GTC CCT CCT TCT TCA AGC OCT GCA 864 
Ser Thr Leu Met Glu Gin Gly Gly Val Pro Pro Ser Ser Ser Pro Ala 
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275 280 285 

OCT TTG CTC AAA GAA GOC ATT CAT GTA ATT ACT TGT GOT TAT GAA GAC 912 

5 Ala Leu Leu Lys Glu Ala He His Val He Ser Cys Gly Tyr Glu Asp 

290 295 300 

AAA ACC GAA TOG GGA AGC GAG CIT GGC TGG ATT TAC GGC TOG ATT ACA 960 
Lys Thr Glu Trp Gly Ser Glu Leu Gly Trp He Tyr Gly Ser He Thr 

io 305 310 315 320 

GAA GAT ATC TTA ACA GGT TTC AAG ATG CAT TGC OGT GGA T 1000 
Glu Asp He Leu Thr Gly Hie Lys Met His Cys Arg Gly 
325 330 
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(2) INFORMATION FOR SBQ ID NO: 10: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 333 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 
(v) FRAGMENT TYPE: internal fragment 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
Asp Lys Val Arg Pro Thr Phe Val Lys Glu Arg Arg Ala Ntet Lys Arg 

1 5 10 15 

Glu Tyr Glu Glu Phe Lys Val Arg He Asn Ala Leu Val Ala Lys Ala 

20 25 30 

Gin Lys Val Pro Pro Glu Gly Trp He Met Gin Asp Gly Thr Pro Trp 

35 40 45 

Pro Gly Asn Asn Thr Lys Asp His Pro Gly Met He Gin Val Phe Leu 
50 55 60 

35 Gly Gin Ser Gly Gly His Asp Thr Glu Gly Asn Glu Leu Pro Arg Leu 

65 7 0 75 80 

Val Tyr Val Ser Arg Glu Lys Arg Pro Gly Phe Leu His His Lys Lys 
85 90 95 

40 Ala G1 Y Met Asn Ala Leu Val Arg Val Ser Gly Val Leu Thr Asn 

100 105 HO 

Ala Pro Phe Met Leu Asn Leu Asp Cys Asp His Tyr Leu Asn Asn Ser 
115 120 125 

45 Lys Ala Val Arg Glu Ala Met Cys Phe Leu Met Asp Pro Gin He Gly 

130 135 140 

Arg Lys Val Cys Tyr Val Gin Phe Pro Gin Arg Phe Asp Gly He Asp 
145 150 155 160 

Arg His Asp Arg Tyr Ala Asn Arg Asn Thr Val Phe Phe Asp He Asn 
165 170 175 
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Met Lys Gly Leu Asp Gly lie Gin Gly Pro Val Tyr Val Gly Thr Gly 

180 185 190 

Cys Val Phe Arg Arg Gin Ala Leu Tyr Gly Tyr Glu Pro Pro Lys Gly 

195 200 205 

Pro Lys Arg Pro Lys Met: Val Thr Cys Gly Cys Cys Pro Cys Phe Gly 

210 215 220 

Arg Arg Arg Lys Asp Lys Lys His Ser Lys Asp Gly Gly Asn Ala Asn 
225 230 235 240 
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His Met Asn Pte Glu Lys Lys Phe Gly Gin Ser Ala lie Phe Val Thr 

260 265 270 

Ser Thr Leu Met Glu Gin Gly Gly Val Pro Pro Ser Ser Ser Pro Ala 

275 280 285 

Ala Leu Leu Lys Glu Ala lie His Val lie Ser Cys Gly Tyr Glu Asp 

290 295 300 

Lys Thr Glu Trp Gly Ser Glu Leu Gly Trp lie Tyr Gly Ser lie Thr 
305 310 315 320 

Glu Asp lie Leu Thr Gly Phe Lys Met His Cys Arg Gly 
325 330 
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(2) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 622 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(v) FRAGMENT TYPE: C- terminal fragment 

(ix) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION: 

(D) OTHER INFORMATION: Xaa indicates Glu or Lys 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
Asp Lys Val Arg Pro Ttir Phe Val Lys Glu Arg Arg Ala Mat Lys Arg 

15 10 15 

Glu Tyr Glu Glu Phe Lys Val Arg lie Asn Ala Leu Val Ala Lys Ala 

20 25 30 

Gin Lys Val Pro Pro Glu Gly Trp lie Met Gin Asp Gly Thr Pro Trp 

35 40 45 

Pro Gly Asn Asn Thr Lys Asp His Pro Gly Met lie Gin Val Phe Leu 
50 55 60 
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Gly Gin Ser Gly Gly His Asp Thr Glu Gly Asn Glu Leu Pro Arg Leu 

65 70 75 80 

Val Tyr Val Ser Arg Glu Lys Arg Pro Gly Phe Leu His His Lys Lys 

85 90 95 

Ala Gly Ala Met Asn Ala Leu Val Arg Val Ser Gly Val Leu Bur Asn 

100 105 no 

Ala Pro Phe Met Leu Asn Leu Asp Cys Asp His Tyr Leu Asn Asn Ser 

115 120 125 

Lys Ala Val Arg Glu Ala Met Cys Phe Leu Met Asp Pro Gin He Gly 

130 135 140 

Arg Lys Val Cys Tyr val Gin Phe Pro Gin Arg Phe Asp Gly He Asp 
145 150 155 16Q 

Arg His Asp Arg Tyr Ala Asn Arg Asn Thr Val Phe Phe Asp He Asn 

!65 170 175 

Met Lys Gly Leu Asp Gly He Gin Gly Pro Val Tyr Val Gly Thr Gly 

180 185 190 

Cys Val Phe Arg Arg Gin Ala Leu Tyr Gly Tyr Glu Pro Pro Lys Gly 

195 200 205 

Pro Lys Arg Pro Lys Met Val Thr Cys Gly Cys Cys Pro Cys Phe Gly 

210 215 220 

Arg Arg Arg Lys Asp Lys Lys His Ser Lys Asp Gly Gly Asn Ala Asn 
225 230 235 240 

Gly Leu Ser Leu Glu Ala Ala Xaa Asp Asp Lys Glu Leu Leu Met Ser 

245 250 255 

His Met Asn Phe Glu Lys Lys Phe Gly Gin Ser Ala He Phe Val Thr 

260 265 270 

Ser Thr Leu Met Glu Gin Gly Gly Val Pro Pro Ser Ser Ser Pro Ala 

275 280 ' 285 

Ala Leu Leu Lys Glu Ala lie His Val He Ser Cys Gly Tyr Glu Asp 

290 295 300 

Lys Thr Glu Trp Gly Ser Glu Leu Gly Trp He Tyr Gly Ser He Thr 
305 310 315 320 

Glu Asp He Leu Thr Gly Phe Lys Met His Cys Arg Gly Trp Arg Ser 
325 330 335 

He Tyr Cys Met Pro Lys Leu Pro Ala Phe Lys Gly Ser Ala Pro He 

340 345 350 

Asn Leu Ser Asp Arg Leu Asn Gin Val Leu Arg Trp Ala Leu Gly Ser 

355 360 3 6 5 

Val Glu He Phe Phe Ser His His Cys Pro Ala Trp Tyr Gly Phe Lys 

370 375 380 

Gly Gly Lys Leu Lys Trp Leu Glu Arg Phe Ala Tyr Val Asn Thr Thr 
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385 390 395 400 

lie Tyr Pro Fhe Thr Ser Leu Pro Leu Leu Ala Tyr Cys Thr Leu Pro 

405 410 415 

Ala lie Cys Leu Leu Thr Asp Lys Phe lie Met Pro Pro lie Ser Thr 

420 425 430 

Phe Ala Ser Leu Phe Phe lie Ala Leu Phe Leu Ser lie Phe Ala Thr 
435 440 445 




Arg Asn Glu Gin Phe Trp Val lie Gly Gly lie Ser Ala His Leu Phe 
465 470 475 480 

Ala Val lie Gin Gly Leu Leu Lys Val Leu Ala Gly lie Asp Thr Asn 

485 490 495 

Fhe Thr Val Thr Ser Lys Ala Thr Asp Asp Glu Glu Phe Gly Glu Leu 

500 505 510 

Tyr Thr Phe Lys Trp Thr Thr Leu Leu lie Pro Pro Thr Thr Val Leu 

515 520 525 

He He Asn Leu Val Gly Val Val Ala Gly He Ser Asp Ala He Asn 

530 535 540 

Asn Gly Tyr Gin Ser Trp Gly Pro Leu Phe Gly Lys Leu Hie Phe Ser 
545 550 555 560 

Phe Trp Val He Val His Leu Tyr Pro Phe Leu Lys Gly Leu Met: Gly 

565 570 575 

Arg Gin Asn Arg Thr Pro Thr He Val Val He Trp Ser Val Leu Leu 

580 585 590 

Ala Ser He Phe Ser Leu Leu Trp Val Arg He Asp Pro Hie Val Met: 

595 600 605 

Lys Thr Lys Gly Pro Asp Thr Thr Met Cys Gly He Asn Cys 
610 , 615 620 

(2) INFORMATION FOR SEQ ID NO: 12: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 9 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: pep-tide 
(v) FRAGMENT TYPE: internal fragment 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
Gin Xaa Xaa Xaa Xaa Xaa Xaa Arg Trp 
1 5 
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(2) INFORMATION FOR SEQ ID NO: 13: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MDLBCUI£ TYPE: other nucleic add 

(A) DESCRIPTION; /desc - "Synthetic DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
GAGAGAGAGA GAGAGAGAGA ACTAGTCTOG AGTTTTTTTT mTlYmi' 

(2) INFORMATION FOR SEQ ID NO: 14: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENOTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "Synthetic DNA" 
(±sc) FEATURE: 

(A) NAME/KEY: 

(B) LOCATION:!. .4 

(D) OTHER INFORMATION: single strand 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
AATTCGGCAC GAG 

(2) INFORMATION FOR SEQ ID NO: 15: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc = "Synthetic DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GACTGAAGAT AAGCCAAAAG 

(2) INFORMATION FOR SEQ ID NO: 16: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "Synthetic DNA" 
(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 16: 
GGAATGATGA ATTTOOOGG 

(2) INFORMATION FOR SEQ ID NO: 17: 



19 




(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic add 

(A) DESCRIPTION: /desc - "Synthetic DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
TGCAGGCAAC TTTGQCATGC 
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(2) INFORMATION FOR SBQ ID NO: 18: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 30 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc * "Synthetic DNA" 
(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 18: 
AGCAACACGA GCAAGATGAG GAGGATGACT 
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(2) INFORMATICS FOR SEQ ID NO: 19: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MDLECUI£ TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "Synthetic DNA" 
(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 19: 
OCGGATOCIT CAACCCTTCT TCGATTTC 

(2) INFORMATION FOR SEQ ID NO: 20: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENCTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "Synthetic DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
OOQGATOCAC GGCAATGCAT CTTX3AAACC 

(2) INFORMATION FOR SEQ ID NO: 21: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sir^le 

(D) TOPOLOGY: Hnogr 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc « "Synthetic DMA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
GGTTAGCATA TTCTTTGTAG CATTCGG 

(2) INFORMATION FOR SEQ ID NO: 22: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc * "Synthetic DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: ' 
ATCAATGAAA TATGTATAGT TCATAGC 

(2) INFORMATION FOR SEQ ID NO; 23: 
(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: sir^le 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc « "Synthetic DNA" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
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CTTTOOTTCT TTTOGTTTTG OCATGGC 



27 



(2) INFORMATION FOR SEQ ID NO: 24: 



(i) SBQUHOCE CHARACTERISTICS: 

(A) LENGTH: 27 base pairs 

(B) TOPE: nucleic acid 

(C) STRANDEENESS: single 

(D) TOPOLOGY: linear 




■ 




(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 




Claims 



1 . A DN A coding for any one of the following proteins (A) to (C): 

(A) a protein having a cellulose synthase activity and comprising an amino acid sequence shown in SEQ ID 
NO: 2 or an amino acid sequence involving deletion, substitution, insertion, or addition of one or several amino 
acids relevant to SEQ ID NO: 2; 

(B) a protein having a cellulose synthase activity and comprising an amino acid sequence shown in SEQ ID 
NO: 4 or an amino acid sequence involving deletion, substitution, insertion, or addition of one or several amino 
acids relevant to SEQ ID NO: 4; and 

(C) a protein having a cellulose synthase activity and comprising an amino acid sequence shown in SEQ ID 
NO: 8 or an amino acid sequence involving deletion, substitution, insertion, or addition of one or several amino 
acids relevant to SEQ ID NO: 8, and an amino acid sequence shown in SEQ ID NO: 11 or an amino acid 
sequence involving deletion, substitution, insertion, or addition of one or several amino acids relevant to SEQ 
ID NO: 11. 

2. A recombinant vector comprising all or a part of the DNA as defined in claim 1 . 

3. A transformed cell transformed with the DNA as defined in claim 1 . 

4. A method for controlling cellulose synthesis in a cell, comprising the steps of introducing the DNA as defined in 
claim 1 into the cell, and expressing RNA having a nucleotide sequence homologous to the DNA as defined in 
claim 1 or a nucleotide sequence complementary to the DNA as defined in claim 1 . 
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PcsA3 



>-< — ^ 

PCSA3-5' PcsA3-3' 



PcsA3-682 



FIG. 1 



S£Q ID NO: 14 



5' AATTCGGCACGAG 3 
3 GCCGTGCTC 5 



FIG. 2 
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10 20 30 40 50 60 

PcsA3-682 CCGACATTCGTOAAGGAGCGTCGAGCTATGAAGAGAGAATATGAAGAATTCAAGGTTAGG 
CSEQ 10 NO: 5) : : : : ; ; ; ; ; ; : : : : : : : ; ; : : : : : : : : : : : ; ; ; : : : : : : : : : : : : : : : : : : : : : : : : : : : 

PcsA3-3' CCGACATTCGTGAAGGAGCGTCGAGCTATGAAGAGAGAATATGAAGAATTCAAGGTTAGG 
(SE3 ID NO: 9 ) 20 30 40 50 60 70 



70 80 90 100 110 120 

PcsA3-682 ArAAArGCACTTGTAQOCAAAQCCCAAAAGGTTCCTCCAGAAGGGTGGArCATGCAAGAr 




PcsA3-682 GGGACACCATGGCCAGGAAACAATACTAAAGATCACCCTGGTATGATTCAAGTATTTCTC 



pcsA3-3* gggacao:atggccaggaaacaatactaaagatcaccctggtatgattcaagtatttctc 

140 150 1 60 170 180 190 

190 200 210 220 230 240 

PcM3-682 ggtcaaagtggaggccatgataccgaaggaaatgagcttcctcgtctcgtctatgtatct 



PcsA3-3* ggtcaaagtggagqccatgataccgaaggaaatgagcttcctcgtctcgtctatgtatct 

200 210 220 230 240 250 

250 260 270 280 290 300 

PcsA3-682 CGAGAGAAAAGGCCTGGTTTCTTGCATCACAAGAAAGCTGGTGCCATGMCGCCCTTGTT 



PcsA3-3' CGAGAGAAAAGGCCAGGTTTCTTGCATCACAAGAAAGCTGGTGCCATGMCGCCCTTGTT 
260 270 280 290 300 310 

310 320 330 340 350 360 

PcsA3-682 CGGGTCTCGGGGGTGCTCACAMTGCTCCTTTTATGTTGAACTTGGATTGTGACCATTAT 



PcsA3-3* CGTGTCTCGGGGGTGCTTACAAATGCTCCTTTTATGTTGAACTTGGATTGTGACCACTAT 
320 330 340 350 360 370 

370 380 390 400 410 420 

PcsA3-682 TTAAATAACAGCAAGGCTGTAAGAGAGGCTATGTGTTTCTTGATGGACCCTCAAATTGGA 



PcsA3-3* TTAAArAACAGCAAGGCTGTMGAGAGGCTATGTGTTTCTTGATGGACCC TC AAATTGGA 
380 390 400 410 420 430 

430 440 450 460 470 480 

PcsA3-682 AGAAAGGTTTGCTATGTCCAATTCCCTCAACGTTTCGATGGTATTGATAGACATGATCGA 



PcsA3-3' AGAAAGGTTTGCT AT GTCC AATTCCCTC AACGTTTCGATGGT AT T GAT AG AC ATGATCGA 
440 450 460 470 480 490 

490 500 510 520 530 540 

PcsA3-682 TATGCCAATCGGAACACAGTTTTCTTTGATATTAACATGAAAGGTCTAGATGGTATACAA 



PcsA3-3* TATGCCMTCGGAACACAGTTTTCTTTGATATTAACATGAMGGTCTAGATGGTATACAA 
500 510 520 530 540 550 



FIG. 3 
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550 560 570 580 590 600 

Pc*A3-682 GGCCCTGTATATGTCGGCACGGGGTGTGTTTTCAGAAGGCAAGCTCTTTATGGTTATGAA 
(SEQ ID NO: 5) : : : : : : : : : : : ; : : : : : : : : : : : : : : : . : : . ; : ; ; : ........................ 

PcsA3-3' GGCCCTGTATATGTCGGCACGGGGTGTGTTTTCAGAAGGCAAGCTCTTTATGGTTATGAA 
(SEQ ID «0: 9) 560 570 580 590 600 610 

610 620 630 640 650 660 

PcsA3-682 CCTCCAAAGGGACCTAAGCGCCCGAAAATGGTAACCTGTGGTTGCTGCCCTTGTTTTGGA 



PcsA3-3' CCTCCAAAGGG^CCTMGCGCCCGAAAATGGTAACCTGTGGTTGCTGCCCTTGCTTTGGA 
620 630 640 050 660 670 

670 680 690 700 710 720 

PcsA3-682 CGCCGCAGAMGGACAAAAAGCACTCTAAGGATGGTGGAAATGCAAATGGTCTAAGCCTA 



PcsA3-3* CGCCGCAGAAA6GACAAAAAGCACTCTAAGGATGGTGGAAATGCAAATGGTCTAAGCCTA 
680 690 700 710 720 730 

730 740 750 760 770 780 

PcsA3-682 GAAGCAGCCAAAGATGACAAGGAGTTATTGATGTCCCACATGAACTTTGAAAAGAAATTT 



PcsA3-3' GAAGC AGCC GAAGAT GAC AAGGAG TT AT TGATGTCCCACATGAACTTTGAAAAGAAATT T 
740 750 760 770 780 790 

790 800 810 820 830 840 

PcsA3-682 GGACAATCAGCCATTTTTGTAACTTCAACACTGATGGAACAAGGTGGTGTCCCTCCTTCT 



PcsA3-3' GGACAATCAGCCATTTTTGTAACTTCAACACTGATGGAACAAGGTGGTGTCCCTCCTTCT 
800 810 820 830 840 850 

850 860 870 880 890 900 

PcsA3-682 TCMGCCCCGCAGCTTTGCTCAMGMGCCAnCATGTMTTAGTTGTGGTTATGAAGAC 



PcsA3-3' TCAAGCCCTGCAGCTTTGCTCAAAGAAGCCATTCATGTAATTAGTTGTGGTTATGAAGAC 
860 870 880 890 900 910 

910 920 930 940 950 960 

PcsA3-682 AAAACAGAATGGGGAAGCGAGCTTGGCTGGATTTACGGCTCGATTACAGAAGATATCTTA 



PcsA3-3' AAAACCGAATGGGGAAGCGAGCTTGGCTGGATTTACGGCTCGATTACAGAAGATATCTTA 
920 930 940 950 960 970 

970 980 
PcsA3-682 ACAGGATTCAAGATGCATTGCCGTGGAT 



PcsA3-3' ACAGGTTTCMGATGCATTGCCGTGGAT 
980 990 1000 



FIG. 4 
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□ Only part of the claims have been paid within the prescribed time limit The present European search 
report has been drawn up for the first ten claims and for those claims for which claims fees have 
been paid, namely claim(s): 
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□ No claims fees have been paid within the prescribed time limit. The present European search report has 
been drawn up for the first ten claims. 



LACK OF UNITY OF INVENTION 



The Search Division considers that the present European patent application does not comply with the 
requirements of unity of invention and relates to several inventions or groups of inventions, namely: 



see sheet B 



□ All further search fees have been paid within the fixed time limit. The present European search report has 
been drawn up for all claims. 

□ As all searchable claims could be searched without effort justifying an additional fee, the Search Division j 
did not invite payment of any additional fee. i 

□ Only part of the further search fees have been paid within the fixed time limit. The present European 
search report has been drawn up for those parts of the European patent application which relate to the 
inventions in respect of which search fees have been paid, namely claims: 



None of the further search fees have been paid within the fixed time limit. The present European search 
report has been drawn up for those parts of the European patent application which relate to the invention 
first mentioned in the claims, namely claims: 

1-4 (partially) 
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Application Number 
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The Search Division considers that (he present European patent application does not comply with the 
requirements of unity of invention and relates to several inventions or groups of inventions, namely: 




acid sequence shown in SEQ 10 NO: 2 or an amino acid sequence 
involving deletion, substitution, insertion, or addition of 
one or several amino acids relevant to SEQ ID N0:2, a 
recombinant vector comprising all or part of said DNA, a 
cell being transformed with said DNA , and a method for 
controlling cellulose synthesis in a cell by the use of said 
DNA. 



2. Claims: 1-4 (partially) 

Claims 1- 4 (partially) refer to a DNA coding for a protein j 

having a cellulose synthase activity and comprising an amino j 

acid sequence shown in SEQ ID N0:4 or an amino acid sequence j 
involving deletion, substitution, insertion, or addition of 

one or several amino acids relevant to SEQ ID N0:4, a ; 

recombinant vector comprising all or part of said DNA, a ; 

cell being transformed with said DNA, and a method for j 

controlling cellulose synthesis in a cell by the use of said \ 

DNA. ; 

i 

3. Claims: 1-4 (partially) j 

Claims 1- 4 (partially) refer to a DNA coding for a protein j 

having a cellulose synthase activity and comprising an amino j 
acid sequence shown in SEQ ID N0:8 and in 5EQ ID NO: 11 or an 
amino acid sequence involving deletion, substitution, 
insertion, or addition of one or several amino acids 

relevant to SEQ ID N0:8 and/or SEQ ID NO: 11, a recombinant | 

vector comprising all or part of said DNA, a cell being j 

transformed with said DNA, and a method for controlling j 

cellulose synthesis in a cell by the use of said DNA. | 
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SEQ ID NO: 6 for PcsA3. 

PcsA1 is different from CelA1 reported by Pear et al. (Proceeding of National Academy of Science, USA (1996), 
93, 12637-12642) in nucleotide sequence by 28 nucleotides. As a result, the former is different from the latter in amino 
acid sequence encoded thereby by 10 amino acid residues. In general, the sugar chain specificity and the substrate 
specificity of the sugar chain transferase are extremely changed by point mutation of the nucleotide of DN A (Yamamoto 
and Hakomori, The Journal of Biological Chemistry (1990) 265, 19257-19262). Therefore, it is unclear whether or not 
CelA1 codes for a protein having the cellulose synthase activity. Incidentally, the 48th Arg, the 56th Ser, the 81st Asn, 
the 104th Ala, the 11 0th Ser, the 247th Asp, the 376th Asp, the 386th Ser, the 409th Arg, and the 649th Ser in the 
amino acid sequence encoded by CelAI correspond to Gin, lie, Ser, Thr, Pro, Asn, Glu, Pro, His, and Gly in PcsA1 
respectively. 

PcsA2 of the present invention contains the same sequence as that of CelA2 reported by Pear et al. However, 
CelA2 has an incomplete length, and it does not contain the entire coding region. CelA2 corresponds to nucleotide 
numbers of 1083 to 3311 in the nucleotide sequence of PcsA2 shown in SEQ ID NO: 3. 

Any of the amino acid sequences shown in SEQ ID NOs: 2, 4, 6, 8, 10, and 11 is a novel sequence. All genes 
having nucleotide sequences coding for the amino acid sequences are included in the present invention. 

The amino acid sequences described above may include deletion, substitution insertion, and/or addition of one 
or more amino acid residues provided that the characteristic of the gene of the present invention is not substantially 
affected. The deletion, substitution, insertion, and/or addition of one or more amino acid residues as described above 
is obtainable by modifying the DNA's coding for the amino acid sequences shown in SEQ ID NOs: 2, 4, 6, 8, 10, and 
11 randomly in accordance with the ordinary mutation treatment or intentionally in accordance with the site-directed 
mutagenesis method. As described above, in general, the sugar chain specificity and the substrate specificity of the 
sugar chain transferase are extremely changed by point mutation of the nucleotide of DNA. Therefore, DNA coding 
for a protein having the cellulose synthase activity is selected from the modified DNA's. The cellulose synthase activity 
can be measured, for example, by means of the method described by T. Hayashi. Measurtng-ft-qlucan deposition in 
plant cell walls, in Modern Methods of Plant Analysis: Plant Fibers, eds. H. F. Linskens and J. F. Jackson, Springer- 
Verlag, 10: 138-160 (1989). 

Those harboring proteins or genes partially different from the sequences shown in Sequence Listing may exist 
depending on, for example, the variety of cotton plant or natural mutation. However, such genes are also included in 
the gene of the present invention. Such a gene may be obtained as DNA which is hybridizable under the stringent 
condition with ail or a part of the coding region of the nucleotide sequence shown in SEQ ID NO: 1, 3, 5, 7, or 9. The 
"stringent condition" referred to herein indicates a condition under which a so-called specific hybrid is formed, and non- 
specific hybrid is not formed. It is difficult to definitely express such a condition by using a numerical value. However, 
for example, the stringent condition is exemplified by a condition under which nucleic acids having high homology, for 
example, DNA's having homology of not less than 80 % undergo hybridization with each other, and nucleic acids having 
homology lower than the above do not undergo hybridization with each other. 

<5> Utilization of gene of the present invention 

The DNA of the present invention makes it possible to control the cellulose synthesis in prokaryotic cells such as 
acetobacterium and/or eukaryotic cells such as yeasts belonging to, for example, the genus Saccharomvces , cells of 
plant such as cotton plant, and cultured cells of mammals and the like. 

Specifically, the cellulose synthesis in the cells as described above can be facilitated, for example, by connecting 
a promoter to an upstream region of the DNA of the present invention, inserting an obtained fragment into an appropriate 
vector to construct a recombinant vector, and introducing the vector into the cells. Alternatively, the cellulose synthesis 
in the cells can be suppressed by introducing an antisense gene of the DNA of the present invention into the cells. 

The promoter and the vector may be selected from those ordinarily utilized to express heterogeneous genes, and 
the method ordinarily employed to express heterogeneous genes may be used as the transformation method. Specif- 
ically, in the case of yeast, it is possible to use a protein-expressing kit produced by Invitrogen, i.e., Pichia Expression 
Kit, and a vector pPIC9 contained in this kit. For example, COS7 cells may be used as mammalian cultured cells, and 
a vector CDM8 may be used therefor. 

The present invention provides the DNA coding for cellulose synthase. The DNA provides a new method for con- 
trolling cellulose production by incorporating the DNA into prokaryotic cells and eukaryotic cells. 

Brief Description of the Drawings 

Fig. 1 shows a relationship between two clones of PcsA3 as an embodiment of the DNA of the present invention. 
Regions interposed between arrows indicate regions for which nucleotide sequences have been determined. A dotted 
line indicates a region for which no nucleotide sequence has been determined. 
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Fig. 2 shows a structure of Eco RI adapter. 

Fig. 3 shows comparison between sequences of PcsA3-682 and PcsA3-3' (former half). 

Fig. 4 shows comparison between sequences of PcsA3-682 and PcsA3-3* (latter half). ":" indicates coincident 
nucleotides, and a# " indicates non-coincident nucleotides. 



Best Mode for Carrying Out the Invention 

Examples of the present invention will be explained below. 
<1 > Preparation of total RNA from cotton plant 



Cotton plant (Gossvpium hirsutum L ) Coker 312 was used as a material. Fiber cells on 1 6 to 1 8 days post an thesis 



which 375 mg of DTT as a powder was added, followed by addition of 200 ml of XT buffer (obtained by adjusting 0.2 
M sodium borate containing 30 mM EDTA and 1 % SDS to be pH 9.0, and then applying a diethylpyrocarbonate 
treatment, followed by autoclaving to obtain a solution to which vanadylribonucleoside was added to give a concen- 
tration of 10 mM) having been heated to 90 to 95 °C. An obtained solution was sufficiently agitated. 

The solution was added with 100 mg of protease K, and it was agitated again. The solution was incubated at 40 
°C for 2 hours, and then it was added with 16 ml of 2 M KCI. The solution was sufficiently agitated again, and it was 
left to stationarily stand in ice for 1 hour, followed by centrifugation for 20 minutes (4 °C) at 12,000 g by using a high 
speed refrigerated centrifuge. 

An obtained supernatant was filtrated, and floating matters were removed. The solution was transferred to a meas- 
uring cylinder to measure the volume. The solution was transferred to another centrifuge tube, to which lithium chloride 
was added in an amount of 85 mg per 1 ml of the extract solution to give a final concentration of 2 M. The solution was 
left to stationarily stand at 4 °C overnight, and then precipitated RNA was separated by centrifugation for 20 minutes 
at 12,000 g. An obtained precipitate of RNA was washed and precipitated twice with cooled 2 M lithium chloride. 

The obtained RNA was dissolved in 10 mM Tris buffer (pH 7.5) to give a concentration of about 2 mg/ml, to which 
5 M potassium acetate was added to give a concentration of 200 mM. Ethanol was added thereto to give a concentration 
of 70 %, followed by cooling at -80 °C for 10 minutes. Centrifugation was performed at 4 °C for 10 minutes at 15,000 
rpm, and then an obtained precipitate was suspended in an appropriate amount of sterilized water to give an RNA 
sample. As a result of quantitative measurement for the RNA sample, total RNA was obtained in an amount of 2 mg. 



<2> Purification of mRNA 



mRNA was purified as a poly(A) + RNA fraction from the total RNA obtained as described above. Purification was 
performed by using Oligotex-dT30 <Super> (purchased from Toyobo) as oligo(dT)-immobilized latex for poly(A) + RNA 
purification. 

Elution buffer (10 mM Tris-HCI (pH 7.5), 1 mM EDTA, 0.1 % SDS) was added to a solution containing 1 mg of the 
total RNA to give a total volume of 1 ml, to which 1 ml of Oligotex-dT30 <Super> was added, followed by heating at 
65 °C for 5 minutes and quick cooling on ice for 3 minutes. The obtained solution was added with 0.2 ml of 5 M NaCI, 
and it was incubated at 37 °C for 10 minutes, followed by centrifugation at 15,000 rpm for 3 minutes. After that, a 
supernatant was carefully removed. 

An obtained pellet was suspended in 2.5 ml of Washing Buffer (10mM Tris-HCI (pH 7.5), 1 mM EDTA, 0.5 M NaCI, 
0.1 % SDS), and the suspension was centrifuged at 15,000 rpm for 3 minutes. After that, a supernatant was carefully 
removed. An obtained pellet was suspended in 1 ml of TE Buffer, and then it was heated at 65 °C 5 minutes. The 
suspension was quickly cooled on ice for 3 minutes, and then it was centrifuged at 1 5,000 rpm for 3 minutes to recover 
poly(A) + mRNA contained in an obtained supernatant. 

Thus, the poly(A) + mRNA in an amount of about 10 u.g was obtained from 1 mg of the total RNA. An aliquot of 5 
u.g thereof was used to prepare a cDNA library. 



<3> Preparation of cDNA library 
(1) Synthesis of cDNA 

The mRNA obtained as described above was used as a template to synthesis cDNA by using a XZAP cDNA 
synthesis kit produced by Stratagene. The following solution was prepared and mixed in a tube. 



6 



EP 0 875 575 A2 



5.0 jil 10 x 1st Strand Buffer (buffer for reverse transcription reaction); 

3.0 pi 10 mM 1st Strand Methyl Nucleotide Mix (5-methyl dCTR dATR dGTP dTTP mixture); 
2.0 uJ Linker-Primer (linker and primer); 
H 2 0 (adjusted to give a total volume of 50 uJ); 
5 1 .0 mI RNase Block II (RNase inhibitor). 

The respective components described above were contents of the kit. Linker-Primer had a sequence as shown in 
SEQ ID NO; 13. Methylated nucleotide was used because it was intended not to allow cDNA to be digested by the 
restriction enzyme reaction performed later on. The reaction solution was agitated well, and then 5.0 jig of poly(A) + 
10 mRNA was added thereto, followed by being left to stand at room temperature for 10 minutes. Further, 2.5 u.l of M- 
MuLV RTase (reverse transcriptase) was added (at this time, the total volume was 50 uJ). The reaction solution was 
gently mixed, followed by centrifugation under a mild condition to allow the reaction solution to fall to the bottom of the 
tube. The reaction was performed at 37 °C for 60 minutes. 

Next, the following solution was prepared and mixed in the tube in a certain order. 

15 

45.0 mI reaction solution containing cDNA primary chain; 
40.0 m! 10 x 2nd Strand Buffer (buffer for polymerase reaction); 
6.0 Ml 2nd Strand Nucleotide Mixture (A, G, C, T mixture); 
302.0 Ml H 2 0. 

20 

The following solution was further added. However, in order to allow RNase and DNA polymerase to simultaneously 
act, enzyme solutions were allowed to adhere to the wall of the tube. After that, a vortex treatment was promptly 
performed, and the reaction solutions were allowed to fall to the bottom of the tube by means of centrifugation to 
perform a reaction for synthesizing cDNA second strand at 16 °C for 150 minutes. 

25 

0.8 Ml RNase H (RNA-degrading enzyme); 
7.5 Ml DNA polymerase I (10.0 uf\x\). 

The reaction solution was added with 400 mI of a mixed solution of phenol: chloroform (1:1). Agitation was performed 
30 well, followed by centrifugation at room temperature for 2 minutes. An obtained supernatant was added with 400 m* of 
phenol: chloroform again, which was subjected to a vortex treatment and centrifugation at room temperature for 2 
minutes. An obtained supernatant was added with the following solution to precipitate cDNA. 

33.3 mI 3 M sodium acetate solution; 
35 867.0 Ml 100 % ethanol. 

The obtained solution was left to stand at -20 °C overnight, and it was centrifuged at room temperature for 60 
minutes. After that, washing was gently performed with 80 % ethanol, followed by centrifugation for 2 minutes. A su- 
pernatant was removed. An obtained pellet was dried, and it was dissolved in 43.5 mI of sterilized water. An aliquot 
io (39.0 mO was added with the following solution to blunt-end cDNA terminals. 

5.0 mI 1 0 x T4 DNA Polymerase Buffer (buffer for T4 polymerase reaction); 
2.5 mI 2.5 mM dNTP Mix (A, G, C, T mixture); 
3.5 mI T4 DNA polymerase (2.9 u/u>l). 

45 

The reaction was performed at 37 °C for 30 minutes, to which 50 \x\ of distilled water was added, and then 100 m' 
of phenol: chloroform was added thereto, followed by a vortex treatment and centrifugation for 2 minutes. An obtained 
supernatant was added with 1 00 m' of chloroform, which was subjected to a vortex treatment, followed by centrifugation 
for 2 minutes. The supernatant was added with the following solution to precipitate cDNA. 

50 

7.0 m' 3 M sodium acetate solution; 
226 Ml 100 % ethanol. 

The solution was left to stand on ice for 30 minutes or more, and it was centrifuged at 4 °C for 60 minutes. An 
55 obtained precipitate was washed with 1 50 m! of 80 % ethanol, followed by centrifugation for 2 minutes and drying. The 
cDNA pellet was dissolved in 7.0 m' of EcoRI Adaptor solution, to which the following solution was added to ligate the 
EcoRI adapter to both ends of the cDNA. Sequences of respective strands of the EcoRI adapter are shown in SEQ ID 
NO: 14 and Fig. 2. 



7 



EP 0 875 575 A2 

1 .0 pi 10 x Ligation Buffer (buffer for ligase reaction); 
1.0 pi 10 mM ATP; 
1.0 pi T4 DNA ligase. 

5 The reaction solution was centrifuged under a mild condition, and it was left to stand at 4 °C overnight or more. 

The solution was treated at 70 °C for 30 minutes, and then it was centrifuged under a mild condition, followed by being 
left to stand at room temperature for 5 minutes. The reaction solution was added with the following solution to phos- 
phorylate 5*-terminals of the EcoRI adapter. 

10 1 .0 m' 10 x Ligation Buffer (buffer for ligase reaction); 



2.0 Ml 10 mM ATP; 




is The reaction was performed at 37 °C for 30 minutes, followed by a treatment at 70 °C for 30 minutes. The solution 

was centrifuged under a mild condition, and it was left to stand at room temperature for 5 minutes. The following solution 
was further added thereto to perform a reaction at 37 °C for 90 minutes so that the Xho l site introduced by Linker- 
Primer was digested with Xho l, followed by being left to stand at room temperature to perform cooling. 



20 28.0 Ml Xhol Buffer; 

3.0 Ml Xhol (45 u/mO- 

The reaction solution was added with 5.0 Ml of 10 x STE (10 mM Tris-HCI (pH 8.0), 100 mM NaCI, 1 mM EDTA), 
which was added into a centrifuge column for removing short fragments (Sephacryl Spin Column) to perform centrif- 

25 ugation at 600 g for 2 minutes to obtain an eluent which was designated as Fraction 1. This operation was further 
repeated three times to obtain Fractions 2, 3, and 4 respectively. Fractions 3 and 4 were combined, to which phenol; 
chloroform (1.1) was added and agitated well, followed by centnl ligation at room temperature for 2 minutes. An obtained 
supernatant was added with an equal amount of chloroform, and an obtained mixture was agitated well. The mixture 
was centrifuged at room temperature for 2 minutes to obtain a supernatant to which a two-fold amount of 1 00 % ethanol 

30 was added, followed by being left to stand at - 20 °C overnight. The solution was centrifuged at 4 °C for 60 minutes, 
followed by washing with an equal amount of 80 % ethanol. The solution was centrifuged at 4 °C for 60 minutes to 
obtain a cDNA pellet which was suspended in 10 m' of sterilized water. 

(2) Preparation of cDNA library 

35 

The double strand cDNA obtained as described above was ligated with X phage expression vector to prepare a 
recombinant vector. The following solution was prepared and mixed in a tube to perform a reaction at 1 2 °C overnight, 
followed by being left to stand at room temperature for 2 hours to ligate cDNA with the vector. 

40 2.5 m' cDNA solution; 

0.5 mMOx Ligation Buffer; 

0.5 M> 10 mM ATP; 

1 .0 Ml >-ZAP vector DNA (1 \ig/\i\)\ 

0.5 mI T4 DNA ligase (4 Weiss u/[i\). 

45 

(3) Packaging of phage DNA into phage particles 

The phage vector containing the cDNA was packaged into phage particles by using an in vitro packaging kit (Gi- 
gapack II Gold packaging extract: produced by Stratagene). The recombinant phage solution was added to Freeze/ 

so Thaw extract immediately after dissolution, and the solution was placed on ice, to which 15 m' of Sonic extract was 
added to perform mixing well by pipetting. The reaction solution was centrifuged under a mild condition, and it was left 
to stand at room temperature (22 °C) for 2 hours. The reaction solution was added with 500 m' of Phage Dilution Buffer, 
to which 20 mI of chloroform was further added, followed by mixing. In order to measure the titer of the library, an aliquot 
(2 mO of 500 Ml of the aqueous phase was diluted in a ratio of 1:10 with 18 m' of SM buffer (5.8 g of NaCI, 2 g of 

ss MgSo 4 *7H 2 0, 50 ml of 1 M Tris-HCI (pH 7.5), and 5 ml of 2 % gelatin in 1 L). The diluted solution (1 m') and the phage 
stock solution (1 mO were plated respectively together with 200 p1 of a culture solution of Escherichia coli PLK-F' strain 
having been cultivated to arrive at a value of OD 600 of 0.5. That is, Escherichia coli PLK-F' strain was mixed with the 
phage solution to perform cultivation at 37 °C for 15 minutes. The obtained culture was added to 2 to 3 ml of top agar 
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(48 °C), which was immediately overlaid on NZY agar plate having been warmed at 37 °C. Cultivation was performed 
overnight at 37 °C. and appeared plaques were counted to calculate the titer. As a result, the titer was 1 .2 x 1 0 6 pfu/ml. 

(4) Amplification of library 

A centrifuge tube was added with the packaging solution containing about 50.000 recombinant bacteriophages 
and 600 pi of a culture solution of Escherichia coli PLK-F strain having been cultivated to have a value of OD 600 of 
0.5, followed by cultivation at 37 °C for 1 5 minutes. The culture solution was added with 6.5 ml of top agar having been 
maintained at 48 °C after dissolution, which was overlaid on 150 mm NZY plate having been warmed at about 37 °C, 
followed by cultivation at 37 °C for 5 to 8 hours. The respective plates were added with 10 ml of SM Buffer to perform 
cultivation at 4 °C overnight with gentle shaking. SM Buffer in the respective plates was collected in a sterilized poly- 
propylene tube. The respective plates were rinsed with 2 ml of SM Buffer, and the rinsing solutions were collected in 
the same tube. Chloroform in an amount corresponding to 5 % of the total amount was added and mixed, followed by 
being left to stand at room temperature for 1 5 minutes. Bacterial cells were removed by centrifugation at 4,000 g for 
5 minutes. An obtained supernatant was added with chloroform in an amount corresponding to 0.3 % of the total 
amount, and it was stored at 4 °C. The titer of the library amplified as described above was measured in the same 
manner as described above. As a result, the titer was 2.3 x 10 9 pfu/ml. 

(5) Excision of plasmid from phage DNA 

In vivo excision of the plasmid portion from the recombinant phage DNA was performed. The following solution 
was mixed in 50 ml of a conical tube to cause infection at 37 °C for 15 minutes: 

culture solution of Escherichia coli XL1 -Blue (OD 600 = 0. 1 ) 200 pi; 
phage solution after amplification 200 pi (> 1 x 10 s phage particles); 
helper phage R408 1 pi (> 1 x 10 6 pfu/ml). 

The mixed solution was added with 5 ml of 2 x YT medium to perform cultivation at 37 °C for 3 hours with shaking. 
A heat treatment was applied thereto at 70 °C for 20 minutes, followed by centrifugation at 4,000 g for 5 minutes. An 
obtained supernatant was decanted and transferred to a sterilized tube. Centrifugation was performed to obtain a 
supernatant which was diluted 100 times to obtain a solution. An aliquot (20 pi) of the solution was mixed with 200 pi 
of a culture solution of Escherichia coli XL1 -Blue having been cultivated to obtain a value of OD 60 o °* 1 -0 to cause 
infection at 37 ° C for 15 minutes. Altquots (1 to 100 pi) of the culture solution were plated on LB plates containing 
ampicillia followed by cultivation at 37 °C overnight. Appeared colonies were randomly selected. Selected colonies 
were added with glycerol, and they were stored at -80 °C. 

(6) Preparation of plasmid 

Plasmids were prepared by using Magic Mini-prep kit produced by Promega. The culture fluid of Escherichia coli 
harboring the plasmid having been stored at -80 *C was inoculated into 5 ml of 2 x YT medium, followed by cultivation 
at 37 °C overnight. Centrifugation was performed for 5 minutes (4,000 rpm, 4 °C), and a supernatant was removed by 
decantation. An obtained bacterial cell pellet was added with 1 ml of TE buffer, followed by a vortex treatment. An 
obtained bacterial celt suspension was transferred to an Eppendorf tube, followed by centrifugation for 5 minutes (5,000 
rpm, 4 ° C). A resultant supernatant was removed by decantation. 

An obtained bacterial cell pellet was added with 300 pi of Cell Resuspension Solution, and it was sufficiently 
suspended therein. An obtained suspension was transferred to an Eppendorf tube. The suspension was agitated for 
2 minutes with a mixer, to which 300 pi of Cell Lysis Solution was added, followed by agitation until the suspension 
became transparent. Neutralization Solution (300 pi) was added thereto, and agitation was performed by shaking with 
the hand, followed by centrifugation for 10 minutes (15,000 rpm). 

Only an obtained supernatant was transferred to a new Eppendorf tube (1 .5 ml). A suction tube was prepared, to 
which a cock, a miniature column and a syringe (injector) were connected in this order. A resin in an amount of 1 ml 
was charged into the syringe. The supernatant was poured into the syringe, and agitation was performed well, followed 
by suction. Column Washing Solution in an amount of 2 ml was added, and washing was performed while performing 
suction. Suction was continued for 1 to 2 minutes in order to dry up. The miniature column was removed from the 
equipment, and it was set in a new Eppendorf tube (1.5 ml). Sterilized water in an amount of 100uJ having been warmed 
at 65 to 70 °C was poured into the miniature column, and the column and the Eppendorf tube were centrifuged together 
for 1 minute (5,000 rpm). An eluted solution was transferred to an Eppendorf tube, to which 5 pi of 3 M sodium acetate 
aqueous solution was added, and 250 pi of cold ethanol was added thereto. The solution was centrifuged (1 5,000 rpm, 
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25 minutes), and a supernatant was discarded. An obtained precipitate was added with 1 ml of 70 % ethanol, followed 
by centrif ligation again (15,000 rpm, 3 minutes). Ethanol was completely removed, and the tube was vacuum-dried in 
a desiccator. The precipitate was sufficiently dissolved in 20 u.l of sterilized water, and an obtained solution was stored 
at -20 °C. An aliquot (1 of the solution was dispensed, and it was subjected to electrophoresis together with volume 
5 markers to quantitatively determine the plasmid DNA. 

<4> Determination of nucleotide sequence of cDNA and homology search with gene data base 

(1 ) Determination of nucleotide sequence of cDNA 

10 



The nucleotide sequence of cDNA was analyzed by using DNA automatic sequencer 373A produced by Applied 




f2) Homology search 

Partial sequences of about 750 clones were searched with a computer using BlastX. As a result, three clones 
appeared to be homologues of bacterial cellulose synthase subunit. Therefore, it was tried to isolate full length clones. 

20 

<5> Isolation of full length clones 
(1) 5'-RACE 

25 As a result of the homology search, the obtained homologue clones were found to be partial length clones. There- 

fore, primers were synthesized to make elongation toward the 5' upstream so that RT-PCR was performed by using 
mRNA as a template. 

( 1 -a) Synthesis of first-strand DNA 

30 

The following solution was prepared and mixed in a tube. 

0.5 uJ 10 jamol gene-specific primer 1: 

1 pg total RNA; 

35 DE PC-treated H 2 0 (adjusted to give a total amount of 9 uJ). 

The following oligonucleotides were used as the gene-specific primer, 1 . That is, an oligonucleotide having a nu- 
cleotide sequence shown in SEQ ID NO: 15 was used for PcsA1. An oligonucleotide having a nucleotide sequence 
shown in SEQ ID NO: 16 was used for PcsA2. An oligonucleotide having a nucleotide sequence shown in SEQ ID NO: 
■*o 1 7 was used for PcsA3. 

The reaction solution was gently mixed, and then it was centrifuged under a mild condition to allow the reaction 
solution to fall to the bottom of the tube. The solution was left to stand at 70 °C for 10 minutes, followed by immediate 
cooling on ice. 

Next, the following solution was prepared and mixed in the tube. 

45 

5 x RT Buffer 5 p1; 
25 mM MgCI 2 2.5 uJ; 

2 mM dNTP mix 5 u.l; 
0.1 MDTT2.5ul; 

so h 2 0 (added to give a total amount of 24 u.l). 

The solution was gently agitated, and then it was centrifuged under a mild condition to allow the reaction solution 
to fall to the bottom of the tube, followed by being left to stand at 42 °C for 1 minute. The solution was added with 1 jil 
of Superscript! I RT (reverse transcriptase, GIBCOBRL), and it was gently mixed. After that, the reaction was performed 
ss at 42 °C for 50 minutes. Subsequently, the reaction solution was left to stand at 70 °C for 15 minutes to stop the 
reaction. Centrifugation was performed under a mild condition to allow the reaction solution to fall to the bottom of the 
tube, followed by being left to stand at 37 °C. RNase H (produced by Toyobo) in an amount of 1 u.l was added thereto 
to perform a reaction at 37 °C for 30 minutes. 
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Subsequently, in order to remove excessive primers and nucleotides contained in the reaction solution, gel filtration 
was performed by using a purification column produced by Boehringer, Quick Spin Columns. At first, the tip of the 
column was removed, followed by centrifugation at 1 ,100 x g for 2 minutes to discard the buffer. The reaction solution 
was introduced into the central area of the column, followed by centrifugation at 1 ,100 x g for 4 minutes to recover the 
s solution. 



(1-b) Poly(dC) tailing 

An aliquot (5 u.l) was dispensed from the obtained solution, to which the following solution was added. 

w 

5 uJ 5 x CoCI 2 Buffer; 
2.5 uJ2mMdCTP: 

H 2 C (adjusted to give a total amount of 24 uJ). 



The reaction solution was mixed well, and it was left to stand at 94 °C for 3 minutes. Centrifugation was performed 
under a mild condition to allow the reaction solution to fall to the bottom of the tube, followed by being left to stand on 
ice. Terminal transferase TdT (produced by Toyobo) was added thereto in an amount of 1 jal, followed by mixing under 
a mild condition to perform a reaction at 37 °C for 10 minutes. Subsequently, the reaction solution was left to stand at 
65 °C for 10 minutes to stop the reaction. 

(1-c) PCR reaction 



An aliquot (2.5 u.l) was dispensed from the reaction solution, to which the following solution was added. 



25 2.5 uJ 10 x PCR Buffer; 

2.5ul2mMdNTP mix; 

0.5 u.l Gene-specific primer 2; 

0.5 ul Abridged Anchor Primer (GIBCO BRL); 

0.5 \x\ Advantage Klentaq Polymerase Mix (Clontech); 
30 H 2 0 (adjusted to give a total amount of 25 u-l). 



The following oligonucleotides were used as Gene-specific primer 2. That is, an oligonucleotide having a nucleotide 
sequence shown in SEQ ID NO; 18 was used for PcsA1. An oligonucleotide having a nucleotide sequence shown in 
SEQ ID NO; 19 was used for PcsA2. An oligonucleotide having a nucleotide sequence shown in SEQ ID NO; 20 was 
35 used for PcsA3. 

The solution was introduced into a 0.2 mi tube to perform the PCR reaction under the following condition. 



40 



PAD 


94 °C 


90 seconds 


30 cycles 


94 °C 


30 seconds 




60 to 68 °C 


30 to 60 seconds 




68 °C 


1 80 seconds 


Final 


68 °C 


7 minutes 


Hold 


4 °C 





45 

The reaction solution was subjected to agarose gel electrophoresis to extract, from the gel, DNA's corresponding 
to portions having the largest size (about 1.8 K for PcsA1, about 2 K for PcsA2, and about 2.2 K for PcsA3). GENO- 
BIND produced by CLONTECH was used for the extraction, and the procedure was carried out in accordance with its 
protocol. The DNA thus obtained was subjected to Poly(dC)tailing, which was used as a template to perform the PCR 
so reaction. The condition and the composition of the reaction solution were the same as those described above. 



(2) Cloning 



(2-a) 5'-RACE TA cloning 

55 

Starting from the obtained PCR reaction solution, cloning was performed by using TA Cloning Kit produced by 
Invitrogen in accordance with its protocol. 

The following solution was added to an aliquot (1 .5 ul) of the PCR reaction solution obtained as described above. 
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0.5 pi 10 x Ligation Buffer; 
1 u.i pCRII vector; 
0.5 u.l T4 DNA Ligase; 
1.5uJdH 2 0. 

5 

The reaction was performed at 14 °C overnight. An aliquot (2 p1) of the reaction solution was added to 25 p1 of 
Escherichia coii competent cell (JM109) preparation, followed by being left to stand for 30 minutes on ice. After that, 
heat shock was applied at 42 °C for 30 seconds. The solution was stationarily left to stand on ice for 2 minutes, to 
which 450 jil of SOB medium was thereafter added to perform cultivation at 37 °C for 1 hour with shaking at 200 rpm. 
10 The culture was spread over Amp/Xgal/IPTG plate, followed by incubation at 37 °C overnight. The plasmid was ex- 
tracted from obtained colonies in accordance with the method as described above. 




*5 The procedure was carried out by using DNA Sequencer 377 produced by ABI in accordance with its protocol. 

The sequencing reaction was performed by using Ml 3 primer and synthetic oligomer as primers, based on the use of 
Dye Terminater Cycle Sequencing Kit produced by the same company. As a result of the sequencing, as for PcsA3, it 
was revealed that another clone also belonging to the group of PcsA3 but having a slightly different sequence (one 
position for amino acid) was isolated (see Figs. 3 and 4). A nucleotide sequence of a clone (PcsA3-682) containing 

20 the 3'-side region of PcsA3 and an amino acid sequence deduced from this nucleotide sequence are shown in SEQ 
ID NOs: 5 and 6. A nucleotide sequence of a S'-portion (PcsA3-5') of another clone containing the 5'-side region of 
PcsA3 and an amino acid sequence deduced from this nucleotide sequence are shown in SEQ ID NOs: 7 and 8. A 
nucleotide sequence of a 3'-portion ( Pes A3-3') of the clone and an amino acid sequence deduced from this nucleotide 
sequence are shown in SEQ ID NOs; 9 and 10. 

25 As for PcsAI and PcsA2, primers for S'-terminal and 3'-terminal of a region containing ORF were synthesized on 

the basis of the obtained sequences to perform the PCR reaction. Thus, complete length clones were isolated by 
means of TA cloning. The condition and the composition of the reaction solution were the same as those described 
above. 

Oligonucleotides shown in SEQ ID NO: 21 (5'-terminal) and SEQ ID NO: 22 (^-terminal) were used as the primers 
30 for PcsAI. Oligonucleotides shown in SEQ ID NO: 23 (S'-terminal) and SEQ ID NO: 24 (3' -terminal) were used as the 
primers for PcsA2. Results are shown in SEQ ID NOs: 1 to 4. 
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Ann x to the description 

SEQUENCE LISTING 

5 

(1) GENERAL INFORMATION: 

(i) APPLICANT: NISSHINBO INDUSTRIES, INC. 

HAYASHI, Takahisa 
10 (li) TITLE OF INVENTION: CELLULOSE SYNTHASE GENE 

(i±i) NUMBER OF SEQUENCES: 24 
(iv) CORRESPONDENCE ADDRESS: 
(A) ADDRESSEE: 
15 (B) STREET: 

(C) CITY: 

(E) COUNTRY: 

(F) ZIP: 

20 (v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC cxanpa-tible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

25 (D) SOFTWARE: Patentln Release #1*0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

30 

(C) CLASSIFICATION : 
(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: JP 9-83133 
3$ (B) FILING DATE: l-APR-1997 

(viii ) ATTORNEY/AGENT INFORMATION : 

(A) NAME: 

(B) REGISTRATION NUMBER: 

40 (XX) TELECOt^MUNICATION INFORMATION: 

(A) TELEPHONE: 

(B) TELEFAX: 

45 (2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 3207 base pairs 

(B) TYPE: nucleic acid 

so (C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA to mRNA 
(vi) ORIGINAL SOURCE: 
55 (A) ORGANISM: Gossypium hirsutum L. 
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10 



20 



25 



30 



35 



40 



45 



(C) INDIVIDUAL ISOLATE: Coker312 
(lx) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION : 77 . .3001 

(xi) SEQUENCE DESCRIPTION: SBQ ID NO: 1: 
GGTTAGCATA TTGTTTGTAG CATTGGGTTT TTTTCTCAAG GAAGAAGAAG GAGAAAGATA 
AGTAATGTTT TTGAGA ATG ATG GAA TCT GGG GTT OCT GTT TGC CAC ACT 

Ntet Ntet Glu Ser Gly Val Pro Val Cys His Thr 
15 10 



50 



TGC 
Cys 

err 

Leu 

GAA 

Glu 
60 
ATG 
Met 

CAT 
His 

GGG 
Gly 

AAC 

Asn 

ATC 

He 
140 
CAG 
Gin 

TAC 
Tyr 



CAT 
His 

AAG 
Lys 
45 
AAC 
Asn 

GCT 
Ala 

ATC 
He 

AAT 
Asn 

AAG 
Lys 
125 
CCA 
Pro 

CCC 
Pro 

CGA 
Arg 



GAA 
Glu 
30 
GAA 
Glu 

CTG 
Leu 

GCA 
Ala 

AGC 
Ser 

COG 
Pro 
110 
AAG 
Lys 

OCT 
Pro 

CTC 
Leu 

ACC 
Thr 



15 
TGT AAT 
Cys Asn 

GGA GAA 
Gly Gin 

TTG GAC 
Leu Asp 



CAT 
His 

AGT 
Ser 
95 
ATT 
He 



TTG 

Leu 
80 
GTG 
Val 

TGG 
Trp 



AAG AAG 
Lys Lys 

GAG CAA 
Glu Gin 

TOG ACT 
Ser Thr 
160 
CTG ATC 
Val He 



TTC OCT 
Phe Pro 

AAA GCT 
Lys Ala 
50 

GAT CTC 
Asp Val 

65 
AGC AAG 
Ser Lys 

TCT ACA 
Ser Thr 

AAG AAC 
Lys Asn 

OCT GCA 
Pro Ala 
130 
CAA ATC 
Gin Met 
145 

ATA ATT 
He He 

ATT ATG 
He Met 



60 
109 




20 25 
ATT TGT AAG ACT TCT TTT GAG TAT GAT 205 
He Cys Lys Ser Cys Phe Glu Tyr Asp 

35 40 
TGC TTG OCT TCT GCT ATT COG TAT GAT 253 
Cys Leu Arg Cys Gly He Pro Tyr Asp 

55 

GAG AAG GCC ACC GGC GAT CAA TOG ACA 301 
Glu Lys Ala Thr Gly Asp Gin Ser Thr 

70 75 
TCT CAG GAT GTT GGA ATT CAT GCA AGA 349 
Ser Gin Asp Val Gly He His Ala Arg 

85 90 
TTG GAT ACT GAA ATC ACT GAA GAC AAT 397 
Leu Asp Ser Glu Met Thr Glu Asp Asn 

100 105 
AGG CTG GAA ACT TGG AAA GAA AAG AAG 445 
Arg Val Glu Ser Trp Lys Glu Lys Lys 
115 120 
ACA ACT AAG GTT GAA AGA GAG GCT GAA 493 
Thr Thr Lys Val Glu Arg Glu Ala Glu 
135 

GAA GAT AAA COG GCA COG GAT GCT TGC 541 
Glu Asp Lys Pro Ala Pro Asp Ala Ser 

150 155 
CCA ATC COG AAA AGC AGA CTT GCA CCA 589 
Pro He Pro Lys Ser Arg Leu Ala Pro 

165 170 
CGA TTG ATC ATT CTC GCT CTT TTC TTC 637 
Arg Leu He He Leu Gly Leu Phe Phe 



55 
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175 180 185 

CAT TAT OGA GTA ACA AAC CCC GTT GAC AGT GCT TTT GGA CTG TOG CTC 685 
His Tyr Arg Val Thr Asn Pro Val Asp Ser Ala Phe Gly Leu Trp Leu 
5 190 195 200 

ACT TCA GTC ATA TGT GAA ATC TGG TTT GCT TTT TCC TGG CTG TTG GAT 733 
Thr Ser Val lie Cys Glu lie Trp Phe Ala Phe Ser Trp Val Leu Asp 
205 210 215 

10 CAG TTC OCT AAG TGG TAT OCT GTT AAC AGG GAA ACA TAC ATT GAC AGA 781 

Gin Phe Pro Lys Trp Tyr Pro Val Asn Arg Glu Thr Tyr He Asp Arg 
220 225 230 235 

CTG TCT GGA AGA TAT GAA AGA GAA GCT GAA OCT AAT GAA CTT GCT GCA 829 
15 Leu Ser Ala Arg Tyr Glu Arg Glu Gly Glu Pro Asn Glu Leu Ala Ala 

240 245 250 

GTT GAC TTC TTT GTG AGT ACA GTG GAT CCA TTG AAA GAG OCT CCA TTG 877 
Val Asp Phe Phe Val Ser Thr Val Asp Pro Leu Lys Glu Pro Pro Leu 
20 255 260 265 

ATT ACT GCC AAT ACT CTG CTT TCC ATC CTT GCC TTG GAC TAC COG GTA 925 
He Thr Ala Asn Thr Val Leu Ser He Leu Ala Leu Asp Tyr Pro Val 
270 275 280 

25 GAT AAG GTC TCT TGT TAT ATA TCT GAT GAT GGT GOG GOC ATC CTG ACA 973 

Asp Lys Val Ser Cys Tyr He Ser Asp Asp Gly Ala Ala Met Leu Thr 

285 290 295 

TIT GAA TCT CTA GTA GAA ACA GOC GAC TTT GCA AGA AAG TOG GTT CCA 1021 
30 Phe Glu Ser Leu Val Glu Thr Ala Asp Phe Ala Arg Lys Trp Val Pro 

300 305 310 315 

TTC TGC AAA AAA TTT TCC ATT GAA OCA OGG GCA OCT GAG TTT TAC TTC 1069 
Phe Cys Lys Lys Phe Ser He Glu Pro Arg Ala Pro Glu Phe Tyr Phe 

320 325 330 

TCA CAG AAG ATT GAT TAC TTG AAA GAT AAA GTG CAG CCC TCT TTT GTA 1117 
Ser Gin Lys He Asp Tyr Leu Lys Asp Lys Val Gin Pro Ser Phe Val 

335 340 345 

AAA GAA OCT AGA GCT ATG AAA AGA GAT TAC GAA GAG TAC AAA ATT CGA 1165 
Lys Glu Arg Arg Ala Met Lys Arg Asp Tyr Glu Glu Tyr Lys He Arg 

350 355 360 

ATC AAT GCT TTA GTT GCA AAG GCT CAG AAA ACA OCT GAA GAA GGA TGG 1213 
45 He Asn Ala Leu Val Ala Lys Ala Gin Lys Thr Pro Glu Glu Gly Trp 

365 370 375 

ACA ATG GAA GAT GGA ACT CCT TGG COG GGA AAT AAC COG OCT GAT CAC 1261 
Thr Met Gin Asp Gly Thr Pro Trp Pro Gly Asn Asn Pro Arg Asp His 
380 385 390 395 

OCT GGC ATG ATT CAG GTT TTC CTT GGA TAT AGC GCT GCT CAT GAC ATC 1309 
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Pro Gly Met lie Gin Val Phe Leu Gly Tyr Ser Gly Ala His Asp lie 

400 405 410 

GAA GGA AAT GAA CTT CCC CGA CTG GTT TAC GTC TCT AGA GAG AAG AGA 
Glu Gly Asn Glu Leu Pro Arg Leu Val Tyr Val Ser Arg Glu Lys Arg 

415 420 425 

OCT GGC TAC CAA CAC CAC AAA AAG OCT GGT OCT GAA AAT OCT TTG GTT 
Pro Gly Tyr Gin His His Lys Lys Ala Gly Ala Glu Asn Ala Leu Val 
430 435 440 

GTT CTT ACA AAT OCT C0C TIC ATC CTC AAT CTT GAT 



1357 



1405 



1453 




so 



445 
TCT GAC 
Cys Asp 
460 

TTC TTG 
Phe Leu 

OCT CAA 
Pro Gin 

AAC ACA 
Asn Thr 

GGG OCT 
Gly Pro 
525 
TAT GGC 
Tyr Gly 
540 

TCA TCT 
Ser Ser 

TCA GAG 
Ser Glu 

TTT AAC 
Phe Asn 

TTG ATC 
Leu lie 
605 



CAC 
His 

ATG 
Met 

AGA 
Arg 

GTT 

val 
510 
GTT 

Val 

TAT 

Tyr 

TGC 
Cys 

CTT 
Leu 

CTT 
Leu 
590 
TCT 
Ser 



TAT 
Tyr 

GAC 
Asp 

TIT 
Phe 
495 
TTC 

Phe 



CTT AAC 
Val Asn 
465 
CCA CAA 
Pro Gin 
480 

GAT GGC 
Asp Gly 

TTT GAT 
Phe Asp 



TAT GTG GGA 
Tyr Val Gly 

GGT CCA CCT 
Gly Pro Pro 
545 

TOG TCT TGC 
Ser Cys Cys 

560 
TAT AGG GAT 
Tyr Arg Asp 
575 

AGG GAA ATT 
Arg Glu lie 

CAA ACA AGC 
Gin Ttir Ser 



AAT AGC AAG GCA GTT AGG GAG GCA ATG TGC 
Asn Ser Lys Ala Val Arg Glu Ala Met Cys 

470 475 
GTC GGT CGA GAT GTC TGC TAT GTG CAG TTT 
Val Gly Arg Asp Val Cys Tyr Val Gin Phe 

485 490 
ATA GAT AGG ACT GAT CGA TAT GCC AAT CGG 
lie Asp Arg Ser Asp Arg Tyr Ala Asn Arg 

500 505 
GTT AAC ATG AAA GGT CTT GAT GGA ATC CAA 
Val Asn Met Lys Gly Leu Asp Gly He Gin 

515 520 
ACA GGT TGT GIT TTC AAT AGG CAA GCA CTT 
Thr Gly Cys Val Phe Asn Arg Gin Ala Leu 
530 535 

TCA ATG OCA ACT TTT COC AAG TCA T0C TOC 
Ser Met Pro Ser Phe Pro Lys Ser Ser Ser 

550 555 
TGC CCC GGC AAG AAG GAA CCT AAA GAT CCA 
Cys Pro Gly Lys Lys Glu Pro Lys Asp Pro 

565 570 
GCA AAA CGG GAA GAA CTT GAT GCT GCC ATC 
Ala Lys Arg Glu Glu Leu Asp Ala Ala He 

580 585 
GAC AAT TAT GAT GAG TAT GAA AGA TCA ATG 
Asp Asn Tyr Asp Glu Tyr Glu Arg Ser Met 

595 600 
TTT GAG AAA ACT TTT GGC TTA TCT TCA GTC 
Phe Glu Lys Thr Phe Gly Leu Ser Ser Val 
610 615 



1501 



1549 



1597 



1645 
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TIC ATT GAA TCT ACA CTA ATG GAG AAT GGA GGA GTG OCT GAA TCT GOC 1981 
Phe lie Glu Ser Thr Leu Met Glu Asn Gly Gly Val Ala Glu Ser Ala 
620 625 630 635 

AAC OCT TOC ACA CTA ATC AAG GAA GCA ATT CAT GTC ATC GOC TCT GOC 2029 
Asn Pro Ser Thr Leu He Lys Glu Ala He His Val He Gly Cys Gly 

640 645 650 

TAT GAG GAG AAG ACT GCA TOG GGG AAA GAG ATT GGA TGG ATA TAT GGT 2077 
Tyr Glu Glu Lys Thr Ala Trp Gly Lys Glu He Gly Trp He Tyr Gly 

655 660 665 

TCA GTC ACT GAG GAT ATC TTA ACC GGC TTC AAA ATG CAC TOC CGA GGA 2125 
Ser Val Thr Glu Asp He Leu Thr Gly Phe Lys Met His Cys Arg Gly 

670 675 680 

TGG AGA TOG ATT TAC TGC ATG CCC TTA AGG CCA GCA TTC AAA GGA TCT 2173 
Trp Arg Ser He Tyr Cys Met Pro Leu Arg Pro Ala Phe Lys Gly Ser 
685 690 695 

20 GCA CCC ATC AAT CTG TCT GAT COG TTG CAC CAG GTT CTT CGA TGG GCT 2221 

Ala Pro He Asn Leu Ser Asp Arg Leu His Gin Val Leu Arg Trp Ala 
700 705 710 715 

CTT GGA TCT GTT GAA AIT TTC CTA AGC AGG CAT TGC OCT CTA TGG TAT 2269 
25 Leu Gly Ser Val Glu He Hie Leu Ser Arg His Cys Pro Leu Trp Tyr 

720 725 730 

GGC ITT GGA GGT GCT CGT CTT AAA TGG CTT CAA AGA CTA GCA TAT ATA 2317 
Gly Phe Gly Gly Gly Arg Leu Lys Trp Leu Gin Arg Leu Ala Tyr He 
30 735 740 745 

AAC ACC ATT GTC TAT OCT TTC ACA TOC CTT OCA CTC ATT GCC TAT TCT 2365 
Asn Thr He Val Tyr Pro Phe Thr Ser Leu Pro Leu He Ala Tyr Cys 

750 755 760 

TCA CTA OCA GCA ATC TGT CTT CTC ACA GGA AAA TTT ATC ATA CCA ACG 2413 
Ser Leu Pro Ala He Cys Leu Leu Thr Gly Lys Phe He He Pro Thr 

765 770 775 

CTC TCA AAC CTG GCA AGT GIT CTC TTT CTT GGC CTT TTC CTT TCC ATT 2461 
Leu Ser Asn Leu Ala Ser Val Leu Phe Leu Gly Leu Phe Leu Ser He 
780 785 790 795 

ATC GTG ACT GCT GTT CTC GAG CTC CGA TGG AGT GGT GTC AGC ATT GAG 2509 
He Val Thr Ala Val Leu Glu Leu Arg Trp Ser Gly Val Ser He Glu 

800 805 810 

GAC TTA TGG CGT AAC GAG CAG TTT TGG GTC ATC GOT GGC GTT TCA GOC 2557 
Asp Leu Trp Arg Asn Glu Gin Phe Trp Val He Gly Gly Val Ser Ala 

815 820 825 

CAT CTC TTT GCC CTC TTC CAA GGT TTC CTT AAG ATG CTT GOG GGC ATT 2605 
His Leu Phe Ala Val Phe Gin Gly Phe Leu Lys Met Leu Ala Gly He 
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830 835 840 

GAC ACC AAC TTT ACT GTC ACT GCC AAA OCA OCT GAT GAT OCA GAT TIT 2653 
Asp Thr Asn Phe Thr Val Thr Ala Lys Ala Ala Asp Asp Ala Asp Phe 

845 850 855 

OCT GAG CTC TAC ATT GTG AAA TGG ACT ACA CTT CTA ATC OCT CCA ACA 2701 
Gly Glu Leu Tyr lie Val Lys Trp Thr Thr Leu Leu lie Pro Pro Thr 
860 865 870 875 

ACA CTC CTC ATC GTC AAC ATG GTT GGT Ore GTT GOC GGA Tlx: TCC GAT 2749 
Thr Leu Leu lie Val Asn Met Val Gly Val Val Ala Gly Hie Ser Asp 




50 



Ala Leu Asn Lys Gly Tyr Glu Ala Trp Gly Pro Leu Phe Gly Lys Val 

895 900 905 

TIC TTT TOC TTC TGG GTC ATC CTC CAT CTT TAT CCA TTC CTC AAA GGT 2845 
Phe Phe Ser Phe Trp Val lie Leu His Leu Tyr Pro Phe Leu Lys Gly 

910 915 920 

CTT ATG GGA CGC CAA AAC AGG ACA CCA ACC ATT GTT GTC CTT TGG TCA 2893 
Leu Met: Gly Arg Gin Asn Arg Thr Pro Thr lie Val Val Leu Trp Ser 

925 930 935 

GTG TTG TTG GCT TCT CTC TTC TCT CTT GTT TGG GTT COG ATC AAC COG 2941 
Val Leu Leu Ala Ser Val Phe Ser Leu Val Trp Val Arg lie Asn Pro 
940 945 950 955 

TTT GTC AGC AOC GOC GAT AGC ACC ACC GTG TCA CAG AGC TGC ATT TOC 2989 
Phe Val Ser Thr Ala Asp Ser Thr Thr Val Ser Gin Ser Cys lie Ser 

960 965 970 

ATT GAT TGT TGATGATATT ATGTCTTTCT TAGAATOGAA ATCATTGCAA 3038 

lie Asp Cys 

GTAAGTQGAC TGAAACATGT CTATTGACTA AGTTTTGAAC AGTTTGTACC CATTITATTC 3098 

TTAGCAGTGT GTAATTTTCC TAAACAATGC TATGAACTAT ACATATTTCA TTGATATTTA 3158 

CATTAAATGA AACTACATCA GTCT9CAGAA AAAAAAAAAA AAAAAAAAA 3207 

(2) INFORMATION FOR SEQ ID NO: 2: 
(1) SEQUENCE CHARACTERISTICS: 

(A) LEJJOTH: 974 amino acids 

(B) TYPE; amirxD acid 
(D) TOPOLOGY: linear 

(il) MOLECULE TYPE: protein 

(xl) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
Met Met Glu Ser Gly Val Pro Val Cys His Thr Cys Gly Glu His Val 

15 10 15 

Gly Leu Asn Val Asn Gly Glu Pro Phe Val Ala Cys His Glu Cys Asn 
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20 25 30 

Phe Pro He Cys Lys Ser Cys Phe Glu Tyr Asp Leu Lys Glu Gly Gin 

35 40 45 

Lys Ala Cys Leu Arg Cys Gly lie Pro Tyr Asp Glu Asn Leu Leu Asp 

50 55 60 

Asp Val Glu Lys Ala Thr Gly Asp Gin Ser Thr Met Ala Ala His Leu 

65 70 75 80 

Ser Lys Ser Gin Asp Val Gly lie His Ala Arg His He Ser Ser Val 

85 90 95 

Ser Thr Leu Asp Ser Glu Met Thr Glu Asp Asn Gly Asn Pro He Trp 

100 105 no 

Lys Asn Arg Val Glu Ser Trp Lys Glu Lys Lys Asn Lys Lys Lys Lys 

115 120 125 

Pro Ala Thr Thr Lys Val Glu Arg Glu Ala Glu He Pro Pro Glu Gin 
130 135 140 

Gin Met Glu Asp Lys Pro Ala Pro Asp Ala Ser Gin Pro Leu Ser Thr 
145 150 155 160 

He He Pro He Pro Lys Ser Arg Leu Ala Pro Tyr Arg Thr Val He 

165 170 175 

He Met Arg Leu He He Leu Gly Leu Phe Phe His Tyr Arg Val Thr 

180 185 190 

Asn Pro val Asp Ser Ala Phe Gly Leu Trp Leu Thr Ser Val He Cys 

I 95 200 205 

Glu He Trp Phe Ala Phe Ser Trp Val Leu Asp Gin Phe Pro Lys Trp 

210 215 220 

Tyr Pro Val Asn Arg Glu Thr Tyr lie Asp Arg Leu Ser Ala Arg Tyr 
225 230 235 240 

Glu Arg Glu Gly Glu Pro Asn Glu Leu Ala Ala Val Asp Phe Phe Val 

245 250 255 

Ser Thr Val Asp Pro Leu Lys Glu Pro Pro Leu He Thr Ala Asn Thr 

260 265 270 

Val Leu Ser He Leu Ala Leu Asp Tyr Pro Val Asp Lys Val Ser Cys 

275 280 285 

Tyr He Ser Asp Asp Gly Ala Ala Met Leu Thr Phe Glu Ser Leu Val 

290 295 300 

Glu Thr Ala Asp Phe Ala Arg Lys Trp Val Pro Phe Cys Lys Lys Phe 
305 310 315 320 

Ser He Glu Pro Arg Ala Pro Glu Phe Tyr Phe Ser Gin Lys He Asp 
325 330 335 

Tyr Leu Lys Asp Lys Val Gin Pro Ser Phe Val Lys Glu Arg Arg Ala 
340 345 350 
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Met Lys Arg Asp Tyr Glu Glu Tyr Lys lie Arg lie Asn Ala Leu Val 

355 360 365 

Ala Lys Ala Gin Lys Thr Pro Glu Glu Gly Trp Thr Met Gin Asp Gly 

370 375 380 

Thr Pro Trp Pro Gly Asn Asn Pro Arg Asp His Pro Gly Met lie Gin 
385 390 395 400 

Val Phe Leu Gly Tyr Ser Gly Ala His Asp lie Glu Gly Asn Glu Leu 
405 410 415 
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His Lys Lys Ala Gly Ala Glu Asn Ala Leu Val Arg Val Ser Ala Val 

435 440 445 

Leu Thr Asn Ala Pro Phe lie Leu Asn Leu Asp Cys Asp His Tyr Val 

450 455 460 

Asn Asn Ser Lys Ala Val Arg Glu Ala Met Cys Phe Leu Met Asp Pro 
465 470 475 480 

Gin Val Gly Arg Asp Val Cys Tyr Val Gin Phe Pro Gin Arg Phe Asp 

485 490 495 

Gly lie Asp Arg Ser Asp Arg Tyr Ala Asn Arg Asn Thr Val Phe Phe 

500 505 510 

Asp Val Asn Met Lys Gly Leu Asp Gly lie Gin Gly Pro Val Tyr Val 

515 520 525 

Gly Thr Gly Cys Val Phe Asn Arg Gin Ala Leu Tyr Gly Tyr Gly Pro 

530 535 540 

Pro Ser Met Pro Ser Phe Pro Lys Ser Ser Ser Ser Ser Cys Ser Cys 
545 550 555 560. 

Cys Cys Pro Gly Lys Lys Glu Pro Lys Asp Pro Ser Glu Leu Tyr Arg 

565 570 , 575 

Asp Ala Lys Arg Glu Glu Leu Asp Ala Ala lie Phe Asn Leu Arg Glu 

580 585 590 

lie Asp Asn Tyr Asp Glu Tyr Glu Arg Ser Met Leu lie Ser Gin Thr 

595 600 605 

Ser Phe Glu Lys Thr Phe Gly Leu Ser Ser Val Phe lie Glu Ser Thr 

610 615 620 

Leu Met Glu Asn Gly Gly Val Ala Glu Ser Ala Asn Pro Ser Thr Leu 
625 630 635 640 

lie Lys Glu Ala He His Val He Gly Cys Gly Tyr Glu Glu Lys Thr 

645 650 655 

Ala Trp Gly Lys Glu He Gly Trp He Tyr Gly Ser Val Thr Glu Asp 

660 665 670 

He Leu Thr Gly Phe Lys Met His Cys Arg Gly Trp Arg Ser He Tyr 
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675 680 685 

Cys Met Pro Leu Arg Pro Ala Phe Lys Gly Ser Ala Pro He Asn Leu 

690 695 700 

Ser Asp Arg Leu His Gin Val Leu Arg Trp Ala Leu Gly Ser Val Glu 
705 710 715 7 20 

lie Phe Leu Ser Arg His Cys Pro Leu Trp Tyr Gly Phe Gly Gly Gly 
725 730 735 

Arg Leu Lys Trp Leu Gin Arg Leu Ala Tyr lie Asn Thr lie Val Tyr 
740 745 750 

Pro Phe Thr Ser Leu Pro Leu lie Ala Tyr Cys Ser Leu Pro Ala H e 

755 76o ?65 

Cys Leu Leu Thr Gly Lys Phe He He Pro Thr Leu Ser Asn Leu Ala 
770 775 ?80 

Ser Val Leu Phe Leu Gly Leu Phe Leu Ser He He Val Thr Ala Val 

J 8 ' , 790 ™5 800 

Leu Glu Leu Arg Trp Ser Gly Val Ser He Glu Asp Leu Trp Arg Asn 

805 810 8 i5 

Glu Gin Phe Trp Val He Gly Gly Val Ser Ala His Leu Phe Ala Val 

820 825 830 

Phe Gin Gly Phe Lsu Lys Met Leu Ala Gly lie Asp Thr Asn Phe Thr 
835 840 8 45 

Val Thr Ala Lys Ala Ala Asp Asp Ala Asp Phe Gly Glu Leu Tyr He 

850 855 860 

Val Lys Trp Thr Thr Leu Leu lie Pro Pro Thr Thr Leu Leu He Val 

f 5 870 875 880 

Asn Met val Gly Val Val Ala Gly Phe Ser Asp Ala Leu Asn Lys Gly 

885 ago 895 

Tyr Glu Ala Trp Gly Pro Leu Phe Gly Lys Val Phe Phe Ser Phe Trp 

900 905 910 

Val He Leu His Leu Tyr Pro Phe Leu Lys Gly Leu Met Gly Arg Gin 
915 920 925 

Asn Arg Thr Pro Thr He val val Leu Trp Ser Val Leu Leu Ala Ser 
930 935 94Q 

Val Phe Ser Leu Val Trp Val Arg He Asn Pro Phe Val Ser Thr Ala 

945 950 955 960 

Asp Ser Thr Thr val Ser Gin Ser Cys He ser He Asp Cys 
965 97 0 

(2) INFORMATION FOR SEQ ID NO: 3: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3311 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(i±) MOLECULE TYPE: CDNA to mRNA 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Gossypium hirsutum L. 
(C) INDIVIDUAL ISOLATE: Coker312 
(ix) FEATURE: 

(A) NAME/KEY: CDS 




so 



(xi) 

UlTIUJl ' llT TTTGGTTTTG CC ATG OCT TCA ACC ACC ATG GOC OCT GGC TTT 

Ntet Ala Ser Thr Thr Met Ala Ala Gly Phe 
15 10 
GGT TCA CTT GCT GTTT GAC GAG AAT CGG GGA TCA TOG ACA CAT CAA TCA 
Gly Ser Leu Ala Val Asp Glu Asn Arg Gly Ser Ser Thr His Gin Ser 

15 20 25 

TCA AOG AAA ATA TQC AGG GTG TGT GGG GAT AAG ATC GGG CAA AAG GAA 
Ser Thr Lys lie Cys Arg Val Cys Gly Asp Lys lie Gly Gin Lys Glu 

30 35 40 

AAC GGA CAA OCG TTC GTG GCT TGT CAT GTC TGT GCT TTC COG GTT TGC 
Asn Gly Gin Pro Phe Val Ala Cys His Val Cys Ala Phe Pro Val Cys 

45 50 55 

OCT OCT TGT TAT GAA TAT GAA AGG AGT GAA GGA AAC CAG TGC TGT CCT 
Arg Pro Cys Tyr Glu Tyr Glu Arg Ser Glu Gly Asn Gin Cys Cys Pro 

60 65 70 

CAG TGC AAT ACT CGC TAT AAG OCT CAC AAA GGT AGT OCA AGA ATT TCA 
Gin Cys Asn Thr Arg Tyr Lys Arg His Lys Gly Ser Pro Arg lie Ser 

75 80 85 90 

GGA GAT GAA GAA GAT GAT TCA GAT CAA GAT GAT TTT GAT GAT GAA TTT 
Gly Asp Glu Glu Asp Asp Ser Asp Gin Asp Asp Phe Asp Asp Glu Phe 

95 100 105 

CAG ATT AAG AAC CGC AAG GAT GAC TCC CAT CCA CAA CAT GAA AAT GAG 
Gin lie Lys Asn Arg Lys Asp Asp Ser His Pro Gin His Glu Asn Glu 

110 115 120 

GAA TAT AAT AAT AAT AAT CAT CAA TGG CAT CCC AAT GGT CAA GCT TTC 
Glu Tyr Asn Asn Asn Asn His Gin Trp His Pro Asn Gly Gin Ala Phe 

125 130 135 

TCA GTT GOC GGA AGC ACG GOG GGG AAG GAT TTC GAA GGG GAT AAA GAG 
Ser Val Ala Gly Ser Thr Ala Gly Lys Asp Leu Glu Gly Asp Lys Glu 
140 145 150 
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ATT TAC GGA AGC GAA GAA TGG AAA GAA AGA GTT GAG AAA TGG AAA GTC 532 

lie Tyr Gly Ser Glu Glu Trp Lys Glu Arg Val Glu Lys Trp Lys Val 

155 160 165 170 

AGG CAA GAA AAA AGA GGT TTG GTA AGC AAC GAT AAT GGC GGA AAT GAT 580 

Arg Gin Glu Lys Arg Gly Leu Val Ser Asn Asp Asn Gly Gly Asn Asp 

175 180 185 

OCT (XT GAA GAA GAT GAT TAT CTC TTG GCT GAA GCT CGC GAG OCT CTA 628 
Pro Pro Glu Glu Asp Asp Tyr Leu Leu Ala Glu Ala Arg Gin Pro teu 

190 195 200 

TGG GGA AAA GTG CCA ATT TOG TCA AGT CTG ATA AGC OCT TAC CGG ATA 676 
Trp Arg Lys Val Pro lie Ser Ser Ser Leu He Ser Pro Tyr Arg He 

205 210 215 

CTC ATC CTC CTC CGA TTC TTC ATC CTC GCA TTT TTC CTC 0GG TTC OCT 724 
Val He Val Leu Arg Phe Phe He Leu Ala Phe Phe Leu Arg Phe Arg 

220 225 230 

ATT CTA ACA CCC GCC TAC GAC GCT TAC O0G TTA TGG CTA ATC TCT GTC 772 
He Leu Thr Pro Ala Tyr Asp Ala Tyr Pro Leu Trp Leu He Ser Val 
235 240 245 250 

ATC TGC GAA GTT TGG TTC GCC TTC T0C TGG ATT CTC GAT CAG TTC OCT 820 
He Cys Glu Val Trp Phe Ala Phe Ser Trp He Leu Asp Gin Phe Pro 

255 260 265 

AAA TGG TTC OCT ATT ACT CGC GAA ACT TAC CTC GAT CGC CTC TCC TTG 868 
Lys Trp Phe Pro He Thr Arg Glu Thr Tyr Leu Asp Arg Leu Ser Leu 

270 275 280 

AGG TTC GAA OCT GAA GGA GAG CCC AAT CAA CTT GGC CCC CTC GAC CTC 916 
Arg Phe Glu Azg Glu Gly Glu Pro Asn Gin Leu Gly Pro Val Asp Val 

285 290 295 

TTC CTC ACT AGC GTT GAC CTT CTC AAG GAA CCC CCC ATC ATA ACC GGC 964 
Phe Val Ser Thr Val Asp Leu Leu Lys Glu Pro Pro He He Thr Ala 

300 305 310 

AAC GOG GTT CTA TOG ATC TTG GCC GTC GAT TAC COG GTC GAG AAA GTG 1012 
Asn Ala Val Leu Ser He Leu Ala Val Asp Tyr Pro Val Glu Lys Val 
315 320 325 330 

TCT TOT TAT GTG TOG GAC GAT GCT GCT TCC ATG CTT CTT TTC GAT TOG 1060 
Cys Cys Tyr Val Ser Asp Asp Gly Ala Ser Met Leu Leu Phe Asp Ser 

335 340 345 

TTG TCT GAA AOG GCT GAG TTC GOG AGG AGA TGG GTT COG TTT TCT AAG 1108 
Leu Ser Glu Thr Ala Glu Phe Ala Arg Arg Trp Val Pro Phe Cys Lys 

350 355 360 

AAG CAT AAT CTT GAG CCC AGG GOG COG GAG TTT TAT TTC AAT GAG AAG 1156 
Lys His Asn Val Glu Pro Arg Ala Pro Glu Phe Tyr Phe Asn Glu Lys 
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ATT GAT TAT TTG AAG GAC AAG GTC CAT OCT AGC TTT GTT AAA GAA OQG 1204 
lie Asp Tyr Leu Lys Asp Lys Val His Pro Ser Phe Val Lys Glu Arg 

380 385 390 

AGA GOC ATG AAA AGG GAA TAT GAA GAA TTT AAA CTA AGG ATC AAT OCA 1252 
Arg Ala Met Lys Arg Glu Tyr Glu Glu Phe Lys Val Arg lie Asn Ala 
395 400 405 410 

TTA CTA GCA AAA GCT CAG AAG AAA CCA GAA GAA GGA TOG GTC ATG CAA 1300 
Leu Val Ala Lys Ala Gin Lys Lys Pro Glu Glu Gly Trp Val Met: Gin 




Asp Gly 



ATT 
He 

GAG 
Glu 

CAG 
Gin 
475 
GCA 
Ala 

TAC 
Tyr 

GAT 
Asp 

TTT 
Phe 

TTC 
Phe 
555 
TAT 
Tyr 



CAG 
Gin 

CTG 
Leu 
460 
CAC 
His 

GTG 
Val 

ATC 
He 

OCT 
Pro 

GAT 
Asp 
540 
TTT 
Phe 

CTA 
Val 



Thr Pro 
430 
GTC TAT 
Val Tyr 
445 

OCT CGA 
Pro Arg 

CAT AAG 
His Lys 

CTT ACT 
Leu Thr 

AAC AAT 
Asn Asn 
510 
CAG TTT 
Gin Ftie 
525 

GCT ATT 
Gly He 

GAT ATC 
Asp lie 

GGC ACA 
Gly Thr 



so 



GAT CCA CCA GTC 



Trp Pro Gly Asn Asn Thr Arg Asp His Pro Gly Mat 

435 440 
CTA GGA ACT GCC GCT GCA CTC GAT GIG GAT GGC AAA 1396 
Leu Gly Ser Ala Gly Ala Leu Asp Val Asp Gly Lys 

450 455 
CTT GTC TAT GTT TCT CCT GAG AAA CGA OCT GCT TAT 1444 
Leu Val Tyr Val Ser Arg Glu Lys Arg Pro Gly Tyr 

465 470 
AAA GOC GCT GCT GAG AAT GCT CTG GTT CGA GTT TCT 1492 
Lys Ala Gly Ala Glu Asn Ala Leu Val Arg Val Ser 

480 485 490 

AAT GCA OOC TTC ATA TTG AAT CTG GAT TCT GAT CAT 1540 
Asn Ala Pro Phe He Leu Asn Leu Asp Cys Asp His 
495 500 505 

AGC AAG GCC ATG AGG GAA GOG ATG TGC TTT TTA ATG 1588 
Ser Lys Ala Met Arg Glu Ala Met Cys Phe Leu Met 

515 520 
GGA AAG AAG CTT TCT TAT CTT CAA TTT OCA CAG AGA 1636 
Gly Lys Lys Leu Cys Tyr Val Gin Phe Pro Gin Arg 

530 535 
GAT OCT CAT GAT CGA TAT GCT AAT CGA AAT GTT GTC 1684 
Asp Arg His Asp Arg Tyr Ala Asn Arg Asn Val Val 

545 550 
AAC ATG TTG GGA TTA GAT GGA CTT CAA GGC OCT CTA 1732 
Asn Met Leu Gly Leu Asp Gly Leu Gin Gly Pro Val 

560 565 570 

GGG TCT GTT TTC AAC AGG CAG GCA TTG TAT GGC TAC 1780 
Gly Cys Val Phe Asn Arg Gin Ala Leu Tyr Gly Tyr 
575 580 585 

TCT GAG AAA CGA OCA AAG ATG ACA TCT GAT TGC TGG 1828 
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Asp Pro Pro Val Ser Glu Lys Arg Pro Lys Met Thr Cys Asp Cys Trp 

590 595 600 

CCT TCT TGG TOT TQC TCT TOT TGC GGA GCT TCT AGG AAG AAA TCA AAG 1876 
Pro Ser Trp Cys Cys Cys Cys Cys Gly Gly Ser Arg Lys Lys Ser Lys 

605 610 615 

AAG AAA GGT GAA AAG AAG GGC TTA CTC GGA GGT CTT TTA TAC GGA AAA 1924 
Lys Lys Gly Glu Lys Lys Gly Leu Leu Gly Gly Leu Leu Tyr Gly Lys 

620 625 630 

AAG AAG AAG ATG ATG GGC AAA AAC TAT GTG AAA AAA GGG TCT GCA OCA 1972 
Lys Lys Lys Met Met Gly Lys Asn Tyr Val Lys Lys Gly Ser Ala Pro 
635 640 645 650 

GTC TTT GAT CTC GAA GAA ATC GAA GAA GGG CTT GAA GGA TAC GAA GAA 2020 
Val Phe Asp Leu Glu Glu He Glu Glu Gly Leu Glu Gly Tyr Glu Glu 

655 660 665 

TPG GAG AAA TOG ACA TTA ATG TOG CAG AAG AAT TTC GAG AAA OGA TTC 2068 
Leu Glu Lys Ser Thr Leu Met Ser Gin Lys Asn Phe Glu Lys Arg Phe 

670 675 680 

GGA CAA TCA OCG GTT TTC ATT GOC TCA ACT TTG ATG GAA AAT GGT GGC 2116 
Gly Gin Ser Pro Val Phe He Ala Ser Thr Leu Met Glu Asn Gly Gly 

685 690 695 

CTT OCT GAA GGA ACT AAT TCC ACA TCA CTG ATT AAA GAG GCC ATT CAC 2164 
Leu Pro Glu Gly Thr Asn Ser Thr Ser Leu He Lys Glu Ala He His 

700 705 710 

GTA ATT AGC TOT GOT TAT GAA GAA AAA ACT GAG TGG GGC AAA GAG ATC 2212 
Val He Ser Cys Gly Tyr Glu Glu Lys Thr Glu Trp Gly Lys Glu He 
715 720 725 730 

GGA TGG ATT TAT GGG TOG GTG AOG GAA GAT ATA TTA ACA GGT TIC AAG 2260 
Gly Trp He Tyr Gly Ser Val Thr Glu Asp He Leu Thr Gly Phe Lys 

735 740 745 

ATG CAT TGT AGA GGG TGG AAA TOG GTT TAT TGT GTA CCG AAA AGA CCG 2308 
Met His Cys Arg Gly Trp Lys Ser Val Tyr Cys Val Pro Lys Arg Pro 

750 755 760 

GCA TTC AAA GGG TCC OCT CCA ATC AAT CTC TOG GAT CGG TTG CAC CAA 2356 
Ala Phe Lys Gly Ser Ala Pro He Asn Leu Ser Asp Arg Leu His Gin 

765 770 775 

GTT TTG AGA TGG GCA CTT GGT TCT GTA GAA ATT TTC CTT ACT OCT CAC 2404 
Val Leu Arg Trp Ala Leu Gly Ser Val Glu He Phe Leu Ser Arg His 

780 785 790 

TCT OCA CTT TGG TAT GGT TAT GGT GGA AAA CTG AAA TGG CTC GAG AGG 2452 
Cys Pro Leu Trp Tyr Gly Tyr Gly Gly Lys Leu Lys Trp Leu Glu Arg 
795 800 805 810 
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CTT OCT TAT ATC AAC ACC ATT GTT TAC OCT TTC AOC TOG 
Leu Ala Tyr lie Asn Thr lie Val Tyr Pro Phe Thr Ser 

815 820 
CTC GOC TAT TCT ACT ATT OCA GCT GTT TGT CTT CTC AOC 
Leu Ala Tyr Cys Thr lie Pro Ala Val Cys Leu Leu Thr 

830 835 
ATC ATT OCA ACT CTA AGC AAC CTT ACA AGT GTG TGG TIC 
lie lie Pro Thr Leu Ser Asn Leu Thr Ser Val Trp Phe 



ATC OCT TTA 2500 
lie Pro Leu 
825 

GOC AAA TTC 2548 
Gly Lys Phe 
840 

TTG GCA CTT 2596 
Leu Ala Leu 




Phe 

GTT 
Val 
875 
GGT 
Gly 

CTA 
Leu 

GAT 
Asp 

ATC 
He 

GGA 
Gly 
955 
TTC 
Phe 

TTC 
Phe 

GTG 
Val 

COG 
Arg 



Leu Ser He 
860 

AGC ATC CAA 
Ser He Gin 

GTC TCC GOC 
Val Ser Ala 



GCT 
Ala 

ACA 
Thr 

OCT 
Pro 
940 
CTT 
Val 



GGA 
Gly 

GAA 
Glu 
925 
OCC 
Pro 



GTA 
Val 
910 
TTC 
Phe 

ACA 
Thr 



TCA GAC 
Ser Asp 



GGC AAA CTG 
Gly Lys Leu 

CTC AAA GGT 
Leu Lys Gly 
990 

CTT TGG TCC 
Leu Trp Ser 

1005 
ATC GAT C0C 
He Asp Pro 



He Ala Thr Gly Val Leu Glu Leu Arg Trp Ser Gly 

865 870 
GAC TGG TGG OGC AAT GAA CAA TTC TGG GTC ATC GGA 2692 
Asp Trp Trp Arg Asn Glu Gin Phe Trp Val He Gly 

880 885 890 

CAT CTT TTT GCT GTC TTC GAG GGC CTC CTC AAA GTC 2740 
His Leu Phe Ala Val Phe Gin Gly Leu Leu Lys Val 
895 900 905 

GAC AOC AAC TTC AOC CTA ACA GCA AAA GCA GCA GAC 2788 
Asp Thr Asn Phe Thr Val Thr Ala Lys Ala Ala Asp 

915 920 
GCT GAA CTT TAT CTC TTC AAA TGG ACA ACT CTC TTA 2836 
Gly Glu Leu Tyr Leu Phe Lys Trp Thr Thr Leu Leu 

930 935 
ACT CTG ATA ATA CTG AAC ATG GTC GGA GTC GTG GOC 2884 
Thr Leu He He Leu Asn Met Val Gly Val Val Ala 

945 950 
GCA ATC AAC AAC GGC TAT GCT TCA TGG GCT OCA TTG 2932 
Ala He Asn Asn Gly Tyr Gly Ser Trp Gly Pro Leu 

960 965 970 

TTC TTC GCA TTC TGG CTC ATT CTT CAT CTT TAC OCA 2980 
Phe Phe Ala Phe Trp Val He Leu His Leu Tyr Pro 
975 980 985 

TTG ATG GGG AGA CAA AAC AGG A0G OCC AOC ATT GTT 3028 
Leu Met Gly Arg Gin Asn Arg Thr Pro Thr He Val 

995 1000 
ATA CTT TTG GCA TOG ATT TTC TCA CTG CTT TGG CTA 3076 
He Leu leu Ala Ser He Phe Ser Leu Val Trp Val 

1010 1015 
TTC TTG OOC AAA CAA ACA GGT OCA GTT CTT AAA CAA 3124 
Phe Leu Pro Lys Gin Thr Gly Pro Val Leu Lys Gin 
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1020 1025 1030 

TOT GGC GTG GAG TGC TAAATGGTGT TTTACAAACC TTTCTTATTA TTTTATTTTC 3179 
Cys Gly Val Glu Cys 
1035 

CCTTTTVaCC ACTACTGTTG ATTTGCTGTG ATTCTAAAAG GGATTTATCT TGTTTGTAAA 3239 
AAGTCTOCTA TGATTTTGTT GOTCAATTT AATTTCTATA TOGTAAAAAA ATATTTCnT 3299 
AAATTAACTA TA 

3311 

(2) INFORMATION FOR SEQ ID NO: 4: 
<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1039 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
Met Ala Ser Thr Thr Met Ala Ala Gly Phe Gly'ser Leu Ala Val Asp 

1 5 10 15 

Glu Asn Arg Gly Ser Ser Thr His Gin Ser Ser Thr Lys lie Cys Arg 

20 25 30 

Val Cys Gly Asp Lys He Gly Gin Lys Glu Asn Gly Gin Pro Phe Val 

35 40 45 

Ala Cys His Val Cys Ala Phe Pro Val Cys Arg Pro Cys Tyr Glu Tyr 

50 55 so 

Glu Arg Ser Glu Gly Asn Gin Cys Cys Pro Gin Cys Asn Thr Arg Tyr 
r 65 70 75 80 

Lys Arg His Lys Gly Ser Pro Arg He Ser Gly Asp Glu Glu Asp Asp 
85 90 g 5 

Ser Asp Gin Asp Asp Phe Asp Asp Glu Phe Gin lie Lys Asn Arg Lys 

100 105 no 

Asp Asp Ser His Pro Gin His Glu Asn Glu Glu Tyr Asn Asn Asn Asn 

115 120 125 

His Gin Trp His Pro Asn Gly Gin Ala Phe Ser Val Ala Gly Ser Thr 

130 135 140 

Ala Gly Lys Asp Leu Glu Gly Asp Lys Glu He Tyr Gly Ser Glu Glu 
145 150 155 160 

Trp Lys Glu Arg Val Glu Lys Trp Lys Val Arg Gin Glu Lys Arg Gly 
165 170 175 

Leu Val Ser Asn Asp Asn Gly Gly Asn Asp Pro Pro Glu Glu Asp Asp 
180 i 8 5 igQ 

Tyr Leu Leu Ala Glu Ala Arg Gin Pro Leu Trp Arg Lys Val Pro He 
195 200 205 
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Ser Ser Ser Leu lie Ser Pro Tyr Arg lie Val lie Val Leu Arg Phe 

210 215 220 

Phe lie Leu Ala Phe Phe Leu Arg Phe Arg lie Leu Thr Pro Ala Tyr 
225 230 235 240 

Asp Ala Tyr Pro Leu Trp Leu lie Ser Val lie Cys Glu Val Trp Phe 

245 250 255 

Ala Pte Ser Trp lie Leu Asp Gin Phe Pro Lys Trp Phe Pro lie Thr 
260 265 270 

Hu Thr Ty r Leu Asp Arg Leu Ser Leu Ar g Phe G l u A r g Gl u Gly 

Glu P*oAsnGln Leu Gly Pro Val Asp Val Phe Val Ser Thr Val Asp 

290 295 300 

t^ii Leu Lys Glu Pro Pro lie He Thr Ala Asn Ala Val Leu Ser He 
305 310 315 320 

Leu Ala Val Asp Tyr Pro Val Glu Lys Val Cys Cys Tyr Val Ser Asp 

325 330 335 

Asp Gly Ala Ser Met Leu Leu Phe Asp Ser Leu Ser Glu Thr Ala Glu 

340 345 350 

Phe Ala Arg Arg Trp Val Pro Phe Cys Lys Lys His Asn Val Glu Pro 

355 360 365 

Arg Ala Pro Glu Phe Tyr Phe Asn Glu Lys He Asp Tyr Leu Lys Asp 

370 375 380 

Lys Val His Pro Ser Phe Val Lys Glu Axg Arg Ala Met Lys Arg Glu 
385 390 395 400 

Tyr Glu Glu Phe Lys Val Arg He Asn Ala Leu Val Ala Lys Ala Gin 

405 410 415 

Lys Lys Pro Glu Glu Gly Trp Val Met Gin Asp Gly Thr Pro Trp Pro 

420 425 430 

Gly Asn Asn Thr Arg Asp His Pro Gly Met He Gin Val Tyr Leu Gly 

435 440 445 

Ser Ala Gly Ala Leu Asp Val Asp Gly Lys Glu Leu Pro Arg Leu Val 

450 455 460 

Tyr Val Ser Arg Glu Lys Arg Pro Gly Tyr Gin His His Lys Lys Ala 
465 470 475 480 

Gly Ala Glu Asn Ala Leu Val Arg Val Ser Ala Val Leu Thr Asn Ala 

485 490 495 

Pro Pha He Leu Asn Leu Asp Cys Asp His Tyr He Asn Asn Ser Lys 

500 505 510 

Ala Met Arg Glu Ala Met Cys Phe Leu Met Asp Pro Gin Phe Gly Lys 

515 520 525 

Lys Leu Cys Tyr Val Gin Phe Pro Gin Arg Phe Asp Gly He Asp Arg 
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